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ABSTRACT 

Self  adaptive  filters  adjust  their  parameters  to  perform 
an  almost  optimal  filtering  operation  without  apriori  know- 
ledge of  the  input  signal  statistics .   Two  approaches  to  the 
design  of  efficient  self  adaptive  discrete  filtering  algorithms 
are  considered. 

For  non-recursive  (FIR)  adaptive  filters,  simplified  esti- 
mations of  the  gradient  of  the  performance  function  to  be 
minimized  are  considered.   These  algorithms  result  in  reduced 
complexity  of  implementation,  improved  dynamic  operating  range 
with  about  the  same  misad justment  errors  and  convergence  time 
as  the  classic  LMS  (Lease  Means  Squared)  algorithm.   An  analy- 
sis of  the  simplified  gradient  approach  is  presented  and  con- 
firmed experimentally  for  the  specific  example  of  an  adaptive 
line  enhancer  (ALE) .   The  results  are  used  to  compare  the 
simplified  gradient  approaches  with  each  other  and  the  LMS 
algorithm.   This  comparison  is  done  using  a  new  graphic  pre- 
sentation of  adaptive  filter  operating  characteristics  and  a 
complexity  index.   This  comparison  indicates  that  the  simplified 
gradient  estimators  are  superior  to  the  LMS  algorithm  for 
filters  of  equal  complexity. 

For  recursive  (IIR)  adaptive  filters  a  combined  random 
and  gradient  search  (RGS)  algorithm  is  proposed,  analyzed  and 
tested.   Since  for  the  IIR  filter,  the  performance  surface  is 
multimodal  in  the  feedback  parameters  and  unimodal  in  the 
feedforward  parameters,  random  search  is  used  to  adjust  the 


feedback  parameters  and  gradient  search  to  adjust  the  feed- 
forward parameters .   Convergence  to  the  globally  optimal 
filter  parameters  is  guaranteed  for  sufficiently  long 
adaptation  time.   Convergence  time  estimation  for  the  RGS 
algorithm  is  derived  and  supported  by  simulation  results  for 
the  ALE  example.   Finally,  apriori  knowledge  of  the  optimal 
filter  structure  is  taken  into  account  in  the  formulation  of 
an  improved  version  of  the  basic  RGS  algorithm.   This  improve- 
ment is  confirmed  with  the  ALE  example. 
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I.   INTRODUCTION 

1.1.   BACKGROUND 

In  a  broad  sense  the  term  filter  implies  an  operation 
on  an  input  signal  or  collection  of  data  in  order  to  smooth, 
predict,  or  estimate  a  desired  property  hidden  in  the  input. 
Fig.  1.1-1  presents  the  block  diagram  of  a  discrete  time 
linear  recursive  digital  filter.   An  optimal  filter  is  one 
designed  to  be  optimum  or  best  with  respect  to  a  performance 
criterion  that  measures  or  expresses  its  effectiveness.   The 
most  commonly  used  approach  to  optimal  filter  design  is  the 
linear  filter  optimized  with  respect  to  a  Minimum  Mean 
Squared  Error  (MMSE) ,  where  the  error  is  defined  as  the 
difference  between  the  filter  output  and  a  desired  signal. 

This  optimal  filter  is  usually  called  the  Wiener  filter. 
Filter  realization  may  be  for:  (a)  analog  signals  and  con- 
tinuous time,  (b)  analog  signals  and  discrete  time,  (c)  di- 
gital signals  and  discrete  time.   This  dissertation  is 
applicable  to  cases  (b)  and  (c).   A  basic  discussion  of 
discrete  Wiener  filters  is  presented  by  Nahi  [28,  Ch  5]. 
As  expected  the  parameters  of  the  optimal  filter  depend  upon 
properties  of  the  input  and  desired  signals.   For  example, 
the  Wiener  filter  solution  depends  upon  the  second  order 
statistics  of  the  input  signal  and  the  desired  signal. 
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Fig.  1.1-1 
Discrete  Linear  Filter 
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The  performance  surface  describes  the  filter  performance 
criterion  as  a  function  of  its  weights  (parameters ,  coef f i- 
cients-a.,  b.,  of  Fig.  1.1-1).   Each  point  of  the  surface 
is  the  value  of  the  performance  criterion  with  specific 
weights  of  the  filter.   The  term  performance  function  will 
be  used  to  describe  the  performance  criterion  values  as 
function  of  time  during  the  adaptation  process.   The  optimal 
filter  weights  are  those  at  the  global  minimum  point  of  the 
performance  surface. 

In  those  cases  where  the  information  (input  statistics) 
needed  to  design  an  optimal  filter  is  not  available,  or  in 
those  cases  where  the  filter  is  required  to  operate  under 
statistically  nonstationary  input  signal  conditions,  the 
usual  optimal  design  approach  is  not  applicable.   In  some  of 
these  cases,  a  self  adaptive  filter  can  be  used  to  overcome 
this  lack  of  information.   The  adaptive  filter  tries  to 
adjust  its  parameters  dynamically  to  variations  in  the  sta- 
tistics of  the  input  signal.   For  the  weight  adjustment,  or 
adaptation,  the  adaptive  filter  uses  an  error  signal. 
Ideally  this  error  is  the  difference  between  the  filter  out- 
put and  a  desired  signal.   In  many  applications  the  desired 
signal  is  not  available  per  se,  so  that  a  reference  signal, 
related  to  the  desired  signal  in  some  way,  is  used  to  develop 
the  error  signal.   Fig.  1.1-2  presents  a  block  diagram  of  an 
adaptive  filter  with  its  input,  output  and  reference  signals. 
The  adaptive  filter  thus  includes  a  signal  processing  section 
which  is  similar  to  a  non-adaptive  filter,  except  that  the 
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filter  weights  are  adjustable  and  controlled  by  the  second 
portion  of  the  adaptive  filter-namely  the  weight  adaptation 
algorithm.   The  weight  optimization  algorithm  typically 
estimates  the  gradient  of  the  performance  surface  and 
adjusts  the  weights  in  the  direction  of  steepest  descent. 
For. a  statistically  stationary  situation,  after  some  tran- 
sient, the  adaptive  filter  can  be  expected  to  reach  a  steady- 
state  condition  at  which  the  parameters  jitter  around  the 
minimum  point  of  the  performance  surface. 

The  generation  of  the  reference  signal  is  a  key  consi- 
deration in  adaptive  filter  implementation.   There  are 
various  practical  methods  as  discussed  in  [1,  2,  3,  7,  22, 
2M- ,  26,  29,  32,  37,  38,  39].   In  many  of  these  applications 
the  reference  signal  is  not  identical  to  the  signal  we  would 
like  to  have  as  output  of  the  filter  because  if  we  had  the 
desired  output  we  wouldn't  need  the  filter.   In  spite  of 
the  approximations  involved,  the  adaptive  filter  is  still 
able  to  operate  and  optimize  the  weights  in  many  practical 
applications . 

This  dissertation  investigates  two  approaches  to  effi- 
cient adaptive  filters.   Chapter  II  discusses  simplified 
gradient  estimation  methods  for  non-recursive   filters  and 
Chapter  III  discusses  recursive   filters  based  on  a  combined 
random  and  gradient  search  adaptation  technique. 
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1.2   FIR*  ADAPTIVE  FILTERS 

The  FIR  filter  is  the  simplest  form  of  digital  filter. 
The  processing  operation  produces  an  output  which  in  the 
linear  sum  of  weighted  delayed  input  samples .   The  impulse 
response  of  this  filter  is  given  by  the  sequence  of  values 
of  the  filter  weights.   Because  of  its  relative  simplicity, 
the  FIR  adaptive  filter  historically  was  the  starting  point 
for  the  development  of  adaptive  filters . 

A  very  important  property  of  this  filter  is  that  its 
performance  surface  is  quadratic  so  we  have  one  and  only  one 
minimum,  i.e.  it  is  a  unimodal  surface  as  shown  by  Widrow  [1]. 
For  a  unimodal  surface,  a  gradient  minimum  seeking  algorithm 
will  converge  to  the  minimum  (a  formal  proof  is  presented  in 
[1]),  and  this .property  is  the  key  to  the  success  of  the 
Least  Mean  Squared  (LMS)  algorithm,  discussed  later.   Inter- 
est in  the  area  of  adaptive  filtering  started  in  the  late 
50' s  and  early  60 's.   The  most  successful  approach  is  Widrow' s 
LMS  algorithm.   Widrow  in  [1]  presents  the  classic  LMS  algori- 
thm and  summarizes  most  of  the  previous  work  on  the  subject. 
The  LMS  algorithm  and  its  basic  properties  are  presented 
later.   In  [3]  Widrow  et  al  introduces  the  concept  of  noise 
cancelling  which  uses  a  reference  signal  that  is  related  only 
to  the  noise  to  estimate  the  noise  portion  of  the  input.   The 


*  FIR  CFinite  Impulse  Response)  and  IIR  (Infinite  Impulse 
Response)  are  generally  used  by  the  signal  processing  community 
to  denote  non- recursive  and  recursive   filters  respectively 
and  are  so  used  in  this  work.   It  is  noted  though,  that  some 
recursive  filters  can  have  a  finite  impulse  response. 
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output  is  produced  by  subtracting  the  noise  estimate  from 
the  input  signal.   In  [4]  Widrow  et  al  extended  the  analy- 
sis to  non-stationary  operation  of  the  LMS  algorith.   In 
this  study  they  identify  two  sources  for  mis adjustment  (a 
measure  of  the  distance  of  the  actual  steady-state  error 
from  the  optimal  steady-state  error)  with  nonstationary 
input  signal.   The  first  is  due  to  gradient  estimation 
errors  (or  gradient  noise)  which  also  exists  with  stationary 
inputs.   The  second  cause  of  misadjustment  with  a  non- 
stationary  input  is  due  to  the  changing  statistics,  and 
results  in  a  lag  in  updating  the  filter  weights  after  the 
optimal  solution.   This  analysis  gives  some  insight  to  the 
problem  and  provides  basic  design  information.   In  [5] 
Widrow  and  McCool  present  a  random  search  FIR  filter  and 
compare  it  to  the  LMS  algorithm.   Using  the  unimodal  pro- 
perty of  the  FIR  filter  they  modify  the  random  search  al- 
gorithm so  that  high  performance  function  value  points 
(which  in  regular  random  search  methods  are  discarded)  con- 
tribute to  convergence  towards  the  optimum.   Their  con- 
clusion is  that  the  LMS  is  a  better  algorithm;  it  converges 
faster  and  produces  less  steady-state  misadjustment.   In 
[6]  Widrow  et  al  present  versions  of  the  LMS  algorithm 
that  operate  on  complex  data.   This  concept  has  recently 
become  important  because  of  the  use  of  adaptive  techniques 
in  the  frequency  domain,  Dentino  [16]  and  Zentner  [17]. 
Lucky,  [7],  introduces  a  Minimum  Magnitude  performance 
criterion  to  derive  an  adaptive  equalizer.   Digital 
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communication  systems  use  equalizers  to  reduce  the  inter- 
symbol  interference  in  a  communication  channel.   Lucky' s 
solution  involves  transmission  of  a  special  training  se- 
quence which  is  known  at  the  receiver  and  is  used  there  as 
the  reference  signal.   Another  interesting  point  in  his 
solution  is  the  use  of  quantized  variables  in  the  adaptation 
algorithm. 

Finally  Frost  [26],  Owsley  [29],  Widrow  et  al  [38], 
Griffiths  and  Jim  [4-1]  and  many  others  discuss  the  use  of 
the  LMS  algorithm  for  adaptive  control  of  sensor  beamforming 
arrays .   We  will  not  discuss  these  applications  in  this 
dissertation  because  of  their  specialized  nature.   However, 
it  is  noted  that  the  simplified  algorithms  presented  here  are 
general  and  may  be  used  to  advantage  in  antenna  arrays . 

From  the  references  the  importance  of  the  LMS  algorithm 
is  very  clear.   Surprisingly  enough,  very  little  was  done  to 
improve  the  basic  algorithm,  the  emphasis  being  primarily  on 
applications  of  the  concept.   Gersho  [4-0]  discusses  adaptation 
in  a  quantized  parameter  space.   Gersho 's  discussion  is  of  a 
general  nature,  i.e.  no  specific  performance  criterion  was 
assumed,  and  his  main  results  is  that  for  unimodal  performance 
surfaces  and  deterministic  gradient  (i.e.  no  need  for  sto- 
chastic gradient  estimation) ,  the  quantized  algorithm  will 
converge  to  the  neighborhood  of  the  optimal  solution. 

Noschner  [27]  is  the  only  published  attempt  to  derive 
computationally  more  efficient  versions  of  the  basic  LMS 
weight  adaptation  algorithm,  and  these  results  have  not 
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been  used  in  practice.   Griffiths  and  Jim  in  a  recent  paper 
[4-1]  discuss  a  simplified  adaptive  system  from  another  point 
of  view.   Their  concern  is  to  simplify  the  signal  processing 
section  in  order  to  achieve  high  frequency  operation.   They 
propose  a  3  level  weight  quantization,  with  no  multiplica- 
tions in  the  signal  processing  portion.   The  resulting 
weight  adaptation  scheme  is  based  on  the  LMS  algorithm,  and 
it  is  necessary  to  store  past  quantizations.   Hence  it  is 
more  complicated,  but  the  goal  of  high  frequency  operation 
is  achieved. 

Summary  of  LMS  Algorithm 

Because  of  its  importance,  the  LMS  adaptive  algorithm  is 
presented  here  following  the  basic  references  [1,  2,  3,  U- ,  5] 
The  basic  filter  output  is  given  by: 


N  -1 

a 

y(k)  =  I    a.  (k)x(k-i)  (1.2-1) 

i=0 


where:   k  is  the  time  index 

N   is  the  number  of  filter  weights 

a.(k)  is  the  ith  weight  at  time  k 

xCk)  =  s(k)  +  n(k)  is  the  input  signal  consisting  of 
desired  signal  sCk)  and  additive  noise  n(k) . 
We  want  to  minimize  the  performance  function: 


J(k)  =  E{£g(k)}  =  E[{y(k)  -  s(k)}2]  (1.2-2) 


17 


where:   e  Ck)  =  yCk)  -  sCk)  is  the  error 

In  order  to  perform  the  adaptation  algorithm  we  need  the 

gradient  of  the  performance  surface: 


V  (k)  =  Hl^i-  i  =  0,1,.. .,N  -1  (1.2-3) 

a  •       da.  a 

l         i 


In  practice  we  don't  have  J(k)  since  s(k)  is  not  known  nor 
do  we  have  an  ensemble  of  processes  to  perform  the  expecta- 
tion operation  of  (1.2-2).   Thus  we  must  use  an  estimate 
of  the  performance  function: 


J(k)  =  e*(k)  =  (y(k)  -  r(k)}2  (1.2-4) 


where  r(k)  is  a  reference  signal,  not  necessarily  identical 
to  s(k) . 

The  gradient  estimate  is  given  by: 


»Jfkl         8er(k)  8£r(k) 

V=  (k)    =  ^4^  =       L        =   2e    (k)— f: 


a.v^;  3a.  9a.  -«-rv~'       ga 

i  l  l  l 


=    2e    (k)^r^-  =    2e    (k)x(k-i)  (1.2-5) 

jl  oa .  it 

l 
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Using  the  gradient  estimate  of  (1.2-5),  the  LMS  weights 
adaptation  are  given  by: 


a. (k+1)  =  a.(k)  -  y   V    =  a . (k)  -  2u  e  (k)x(k-i) 

x         i       a  a »     i        ar 

1 


1,2, . . . ,  N  -1  (1.2-6) 

3. 


where  y   is  the  adaptation  gain  controlling  the  convergence 
a 

and  steady-state  properties  of  the  filter. 

Reference  [4-]  assumes  a  stationary  input  with  uncorre- 

lated  samples  and  derives  formulas  for  the  stability  region, 

convergence  time,  and  misadjustment  as  follows. 

Stable  convergence  of  the  adaptation  algorithm  is  limited  to 

values  of  u   given  by: 
a 

o  <  ua  <  l/IMaRxx(o)]  (1.2-7) 


where  R   (m)  =  E (x(k)x(k-m) }  is  the  autocorrelation  function 
of  the  input.   Equation  (1.2-7)  was  derived  using  the  mean 
of  the  gradient  estimate.   So,  in  practice,  in  order  to  be 
stable  at  all  times  we  need 


U.  <<1/.[N  R   (o)  ]. 
a       a  xx   J 


19 


The  approximate  Mean  Squared  Error  (MSE)  convergence  time 
constant  is  given  by 

XMSE  =l7[4^aRxx(o)]  d.2-8) 


The  misadjustment ,  M,  is  defined  as  the  ratio  of  the  excess 
Mean  Squared  Error  (MSE),  due  to  adaptive  filter  steady- 
state  Jitter  around  the  optimal  solution,  to  the  minimum 
MSE: 


M  =  J  steady -state  -  J  min  =        (q)  (1.2-9) 

J  mm  a  a  xx 

where 

Jss  =  J  steady-state  =  lim  J(k) 

k  ■+■  °° 

J  min  =  Jss  lf±1^   °Ptimal]   =  Minimum  MSE 


The  misadjustment  estimate  (1.2-9)  was  derived  for  an  ideal 
reference  signal,  r(k)  =  s(k),  and  does  not  apply  to  cases 
of  noisy  reference. 

The  derivations  in  [1,  2,  3,  4-,  5]  are  based  upon  the 
use  of  eigenvalue  eigenvector  analysis .   To  obtain  practical 
estimation  formulas  the  eigenvalues  based  equations  are 
approximated  by  correlation  functions.   The  analysis  presented 
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in  this  dissertation  makes  the  approximations  at  the  start 
of  the  derivations  and  uses  correlation  functions  throughout, 
The  advantage  of  this  approach  is  that  it  provides  better 
insight  into  the  nature  of  the  approximations. 
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1.3   THE  IIR*  ADAPTIVE  FILTER 

An  IIR  filter  uses  previous  output  values  to  compute 
the  present  filter  output: 


N  -1  N. 

a  b 

y(k)  =  2  a.  X  (k-i)  +  E   b.  y  (k-i)             (1.3-1) 

i=o  1         i=l  1 


Because  of  the  feedback  in  (1.3-1)  the  impulse  response  may 
be  infinite  and  is  designated  IIR. 

Because  of  inherent  savings  due  to  the  use  of  previous 
calculated  values  (the  existence  of  poles  in  the  transfer 
function) ,  the  IIR  filter  is  the  most  efficient  filtering 
scheme  for  many  applications . 

Since  it  uses  feedback,  the  IIR  filter  can  be  unstable. 
This  presents  a  design  problem  for  the  conventional  IIR 
filter,  and  a  basic  requirement  for  an  .IIR  adaptation  algorithm 
is  to  assure  that  the  resulting  filter  is  stable.   A  second 
disadvantage  of  the  IIR  adaptive  filter  is  the  multimodal 
nature  of  its  performance  surface  as  discussed  in  section  3.1  . 

White  [8]  was  the  first  to  suggest  the  use  of  IIR  struc- 
tures for  an  adaptive  filter.   He  indicates  a  possible  use  of 
several  performance  criteria  and  derives  the  gradient  ex- 
pression for  the  Minimum  Mean  Squared  Error  (MMSE)  performance 
criterion.   In  [9],  Stearns  et  al  presents  an  all  adaptive  IIR 


*  FIR  (Finite  Impulse  Response)  and  IIR  (Infinite  Impulse 
Response)  are  generally  used  by  the  signal  processing  community 
to  denote  non-recursive  and  recursive  filters  respectively  and 
are  so  used  in  this  work.   It  is  noted  though,  that  some 
recursive  filters  can  have  a  finite  impulse  response. 
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filter.   Stearns'  algorithm  is  rather  complex,  i.e.  the  number 


of  operations  (multiplications  and  additions)  is  proportional 

2  .... 

to  a,N  N,  +cuN.  )  compared  to  the  relative  simplicity  of 

the  LMS  where  the  number  of  operations  is  proportional  to  N . 
Stearns '  algorithm  is  discussed  later  and  its  gradient  esti- 
mation method  is  presented  with  details. 

In  [10]  Feintuch  presents  a  much  simpler  adaptive  IIR 
filter  which  consists  of  two  LMS  adaptive  sections,  one  controls 
the  feedforward  weights  adaptation  and  the  second  controls  the 
feedback  weights  adaptation.   Feintuch 's  algorithm  gradient 
estimation  method  is  presented  later  on  in  this  section. 
Feintuch 's  algorithm  works  in  some  cases  but,  as  pointed  out 
by  several  investigators  [11,  12],  the  derivation  has  errors 
and  the  filter,  at  least  in  the  examples  presented  in  [11], 
does  not  converge  to  the  optimal  solution. 

In  [13]  Parikh  and  Ahmed  used  the  same  examples  presented 
in  [11]  to  demonstrate  the  convergence  properties  of  Stearns' 
algorithm.   Reference  [13]  shows  that  Stearns'  algorithm  does 
converge  to  a  minimum  point,  but  with  a  multimodal  performance 
surface  the  steady-state  might  be  around  a  local  minimum  or  the 
global  minimum  depending  upon  the  starting  point  of  the  adapta- 
tion process.   McMurray,  [14-],  investigates  the  dependence  of 
Feintuch 's  algorithm  stability  on  the  values  of  its  adaptation 
gains.   The  region  of  stable  operation  turns  out  to  be  a  tri- 
angle in  the  adaptation  gains  space.   In  [15]  McMurray  inves- 
tigates the  convergence  time  for  Feintuch 's  algorithm  IIR 
filtering  of  narrow  band  signals  and  compares  operation  in  the 
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time  and  frequency  domains.   In  both  cases  the  convergence 
time  is  inversely  proportional  to  the  square  root  of  the 
multiplication  of  four  factors:   feedforward  adaptation  gain, 
feedback  adaptation  gain,  number  of  feedforward  weights,  and 
the  number  of  feedback  weights.   An  additional  conclusion  was 
that  the  convergence  time  is  shorter  for  the  time  domain 
operation.   Parker  and  Kq,  [18],  extend  the  adaptive  IIR 
filter  for  image  processing.   In  [3  5]  Treichler,  Larimore 
and  Johnson  modify  Feintuch's  algorithm  by  passing  the  error 
term  through  a  FIR  filter.   This  modification  allows  for  con- 
vergence to  a  minimum  (not  necessarily  global) ,  and  its  use  is 
limited  by  the  information  needed  for  the  design  of  the  error 
term  filter.   The  existing  IIR  adaptive  algorithms  are  based 
upon  Stearns'  and  Feintuch's  algorithms  which  are  summarized 
briefly  in  the  following. 

In  order  to  have  a  practical  adaptation  method  we  use  a 
performance  function  estimate: 


J(k)  =  er2(k)  =  {y(k)  -  r(k)}2  (1.3-2) 


where  r(k)  is  the  reference  signal  and  y(k)  is  given  by 
(1.3-1)  with  weights  a.(k)  and  b.(k)  being  a  function  of  time 
The  gradient  estimate  is  given  now: 

;a   =  i£UsL=2e   9Vk)  =  2s   4^  d.3-3) 

vai    3a.      r    3ai      r   3a± 

the  derivitive  — Xr- is  not  as  simple  as  in  (1.2-5)  because 

i 
of  the  feedback  terms  such  as  b.y(k-j)  present  in  y(k) . 
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The  final  form  of  the  gradient  estimates  of  Stearns'  algorithm 
are  given  by: 

V..(k)  =  2e  (k)  a.  (k)      i  -  0,1, . . . ,N  -1      (1.3-4) 
a.i       r     i  a 


where : 

Nb 
a.(k)  =  X(k-i)  +Z  b.  (k)a.  (k-j)  (1.3-5) 

1  i»l  3    ± 


and: 


Vb  (k)  =  2er(k)6i(k)        i  =  1,2,. ..,Nb         (1.3-6) 
i 


where : 

Nb 
0.  (k)  =  y(k-i)  +Z   b.  (k)3.(k-j)  (1.3-7) 

1  j=l   D     1 

Equations  (1.3-4,  5,  6,  7)  are  the  gradient  estimates  of 
Stearns'  algorithm.   Feintuch's,  [10],  algorithm  uses  only 
the  first  terms  in  the  expressions  for  a.  (1.3-5),  and  3. 
(1.3-7)  and  the  resulting  gradient  estimates  are: 


V   (k)  =  2e  (k)  x  (k-i)  (1.3-8) 

ai        r 


Vb  (k)  =  2e  (k)  y  (k-i)  (1.3-9) 

i 


With  both  algorithms  the  weights  adaptation  is  given  by: 
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a. (k+1)  =  a. (k)  -  ya  V   (k)        i=0,l,.,.,N  -1    (1.3-10) 

X  X  d.    d  •  3. 


X 


b.(k+l)  =  b  (k)  -  y   Vb  (k)        i=l,2,...,N       (1.3-11) 

i 


where  y  and  y,  are  the  feedforward  and  feedback  adaptation 
a      o 

gains.   These  adaptive  IIR  filtering  schemes  are  not  satis- 
factory solutions  to  the  IIR  filtering  problem.   Steam's 
algorithm  is  not  satisfactory  because  of  the  following 
reasons : 

-  The  instability  problem  mentioned  by  Elliott,  Jacklin 
and  Stearns,  [25];  this  problem  is  discussed  later. 

-  The  algorithm  does  not  assure  convergence  to  the  global 
minimum . 

-  It  is  a  complicated  algorithm. 

The  Feintuch  algorithm  is  not  satisfactory  mainly  because,  in 
some  cases  it  fails  to  converge  to  a  minimum  point,  and  in 
all  cases  does  not  assure  convergence  to  the  global  minimum 
of  the  performance  surface. 
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1.4   INTRODUCTION  TO  ADAPTIVE  FIR  FILTERS  USING  SIMPLIFIED 
GRADIENT  ESTIMATIONS 

The  LMS  algorithm  is  being  used  in  many  adaptive  fil- 
tering applications  [1-5-6,  16,  17,  22,  24,  26,  29,  32,  34,  37, 
38,  39,  41],  with  satisfactory  results.   The  possibility  of 
using  simplified  algorithms,  with  hardware  and  time  savings, 
has  not  received  much  attention.   Gersho  [40],  and  Moschner 
[27],  and  recently  Griffiths  and  Jim  [41]  (which  discusses  a 
somewhat  different  problem  of  simplifing  the  signal  processing 
portion  with  more  complicated  adaptation  algorithm)  appear 
to  be  the  only  publications  in  this  area.   All  applications, 
except  [41],  seem  to  select  the  classical  LMS  algorithm  and 
not  a  simplified  version.   A  possible  reason  for  this  fact 
might  be  the  lack  of  confidence  in  the  performance  of  a 
simplified  algorithm,  compared  with  the  many  satisfactory 
results  obtained  with  the  use  of  the  LMS  algorithm.   This 
dissertation  will  demonstrate  analytically,  and  by  extensive 
simulation,  the  advantages  and  savings  associated  with  the 
use  of  the  simplified  algorithms.   One  natural  simplified 
algorithm  investigated  here  is  the  use  of  a  positive  or  nega- 
tive Fixed  Step  Correction  (FSC)  in  the  adaptation,  instead 
of  the  LMS  correction  which  is  proportional  to  the  value  of 
the  gradient.   This  gradient  estimation  is  given  by: 


VFSC(i,k)  =  Sgn{VLMS(i,k)}=Sgn{e(k) }Sgn{x(k-i) }   (1.4-1) 
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where : 


1  if  [•]  >  o 
Sgn[-]  =  { 

-1  if  [■]  <  o 


The  second  algorithm  investigated  here  is  to  use  a  modified 
FSC  with  the  step  size  proportional  to  the  magnitude  of  the 
error.   This  algorithm  is  called  here  the  Simplified  LMS 
(SLMS),  Moschner  [27]  called  this  the  clipped  LMS.   The  SLMS 
has  the  following  gradient  estimate: 


SLMS 


U,k)  =  er(k)  Sgn{X(k-i)}  (1.4-2) 


Chapter  II  discusses  these  algorithms  and  presents  an  analysis 
and  simulation  of  adaptive  FIR  filter  operation  using  these 
algorithms . 

The  optimal  Wiener  filter  depends  upon  the  statistics  of 
the  input  signal  and  the  desired  signal,  the  steady-state  be- 
havior of  an  adaptive  filter  depends  upon  the  corresponding 
statistics.   Since  the  desired  signal  is  not  available  for  the 
adaptive  filter,  and  it  uses  a  reference  signal  which  is  only 
related  to  the  desired  signal,  it  is  obvious  that  the  pro- 
perties of  this  filter  differ  depending  upon  the  application 
and  manner  in  which  the  reference  signal  is  provided.   In 
Chapter  II  an  adaptive  FIR  filter  is  used  as  an  adaptive  line 
enhancer  (ALE)  [3,  3H,  37,  39 J  which  is  a  typical  signal 
processing  application  and  utilizes  a  noisy  reference. 
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Appendix  A  describes  the  simulation  details . 

The  discussion  in  Chapter  II  includes  for  each  algorithm 
the  following  topics: 

-  convergence  and  stability,  section  2.2. 

-  convergence  time  (TC) ,  section  2.3. 

-  steady-state  misadjustment  (M) ,  section  2.4. 

-  implementation  complexity,  section  2.1. 

-  dynamic  range,  seciton  2.6. 

Sections  2.3  and  2.4  include  derivations  of  estimation 
formulas  to  the  convergence  time,  TC ,  and  misadjustment,  M, 
of  the  FSC  and  SLMS  algorithms . 

The  simulation  experiment,  described  in  Appendix  A,  shows 
good  agreement  to  these  misadjustment  and  convergence  time 
formulas . 

Fig.  1.4-1  presents  a  typical  operation  of  the  adaptive 
FIR  filter  with  LMS ,  FSC,  and  the  SLMS  algorithms.   This 
figure  shows  a  typical  weight,  a,  ,  and  the  Mean  Squared 
Error  (MSE),  as  a  function  of  time  for  the  three  algorithms 
as  noted.   On  each  plot  we  have  drawn  the  optimal  value  of 
the  weight  or  the  MSE,  an  ensemble  average  of  100  runs  as  well 
as  the  convergence  of  an  individual  filter  (single  run).   In 
Fig.  1.4-1  all  of  the  algorithms  perform,  on  the  average, 
about  the  same . 

For  more  accurate  comparison,  a  graphic  presentation  of 
adaptive  filter  properties  is  introduced  in  Section  2.1.   This 
graphic  presentation,  the  Adaptive  Filter  Operating  Charac- 
teristic (AFOC)  is  used  to  compare  equal  degree  and  equal 
complexity  filters  with  different  algorithms . 
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Fig.    1.4-1 
Typical   FIR  adaptive   filters  operation  with  N      =    15,    yLMg   =    .0005 
yFSC   =   ySLMS    =    -001'    ALE   experiment 
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If  one  looks  ahead  to  Fig.  2.1-1  it  is  apparent  that  the 
simplified  gradient  algorithms  (FSC  and  SLMS) ,  when  compared 
to  the  LMS  algorithm  with  equal  complexity  (cost)  and  equal 
convergence  time,  are  more  effective  and  provide  more  pro- 
cessing gain  (processing  gain  is  defined  later  as  a  measure 
of  filter  effectiveness). 
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1.5   INTRODUCTION  TO  ADAPTIVE  IIS  FILTERS  USING  RANDOM 
SEARCH  TECHNIQUES 

Adaptive  IIR  filters  based  on  gradient  methods  have  one 
major  disadvantage  which  is  the  multimodal  structure  of  the 
performance  surface  as  discussed  in  section  3.1.   Thus  there 
is  no  inherent  way  to  assure  a  steepest  descent  gradient  con- 
vergence to  the  global  minimum.   The  convergence  problem 
and  additional  disadvantages  of  Stearns'  and  Feintuch's 
algorithms,  as  discussed  in  section  1.3,  suggests  that  gra- 
dient methods  may  not  be  the  best  adaptation  scheme  for  the 
IIR  filter.   Thus  a  different  adaptation  technique,  namely, 
random  search,  is  considered  here.   The  basic  concept  of 
random  search  is  discussed  in  section  3.2. 

A  random  search  IIR  filter  is  presented  and  discussed 
in  section  3.3.   It  is  concluded  there  that  this  scheme  is 
not  satisfactory.   The  fact  that  the  IIR  filter's  performance 
surface  is  quadratic  in  the  feedforward  weights  (Elliott  et 
al  L25J)  is  the  key  for  the  hybrid  Random  and  Gradient 
Search  (RGS)  algorithm  developed  in  section  3.4.   This  new 
algorithm  provides  for  satisfactory  operation  of  an  IIR 
adaptive  filter.   Convergence  analysis  of  the  RGS  algorithm 
and  convergence  time  estimation  for  a  typical  signal  pro- 
cessing situations  is  given  in  section  3.5. 

For  cases  where  information  is  available  on  the  structure 
of  the  optimal  filter,  a  constrained,  or  apriori  structured 
filter  algorithm  can  be  implemented.   This  concept  is 
discussed  in  section  3.6  and  shows  good  results.   Fig.  1.5-1 
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presents  the  error  convergence  of  four  filters  -  the  LMS  FIR 
filter  with  20  weights,  a  RGS  IIR  filter,  an  apriori  struc- 
ture adaptive  pole  (ASPOL,  section  3.6)  IIR  filter,  and 
Feintuch  algorithm  IIR  filter.   The  IIR  filters  have  two 
feedback  and  three  feedforward  weights .   For  this  example  it 
is  seen  that: 

1.  The  LMS  algorithm  converges  fastest. 

2.  The  RGS  converges  slower  but  reaches  a  lower  steady- 
state  error. 

3.  The  ASPOL  converges  to  the  lowest  steady-state  error, 
faster  than  the  RGS . 

4.  Feintuch  algorithm  converges  to  the  highest  steady- 
state  error. 

These  examples  are  typical. 
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p  is  a  parameter  constant  (pole  magnitude) 

R  is  the  number  of  output  samples  used  in  estimating  the 
performance  surface  value  for  a  fixed  set  of  parameters 
(random  search  interval) , 

Fig.  1.5-1 
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Error  Convergence  For  Several  Adaptive  Filters 
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II.   ADAPTIVE  FIR  FILTERS  USING  SIMPLIFIED 
GRADIENT  ESTIMATIONS 

2.1   TWO  SIMPLIFIED  GRADIENT  ALGORITHMS 

Two  simplified  gradient  algorithms  are  considered: 

(a)   The  Fixed  Step  Correction  (FSC)  adaptation  scheme 
is  given  by: 

ai(k+l)  =  a^k)  -  uaSgn{V  (i,  k)  }  (2.1-1) 


This  formulation  is  essentially  binary  and  was  motivated  by 

the  general  success  of  bang-bang  type  controllers.   The 

adaptation  gain  u   is  the  size  of  the  fixed  correction  step 

a 

We  define  the  FSC  gradient  estimate  as: 


VFSC  (i,  k)  =  Sgn  {VLMS(i,  k)}=Sgn  le(k)}  Sgn  {x(k-i)} 

(2.1-2) 
It  should  be  noted  that  the  sign  of  the  gradient,  Sgn  {V  (i,k)}: 
Sgn  {x(k-i)}  Sgn  (e(k)},  is  identical  for  both  error  magnitude 
and  mean  square  error  estimates ,  that  is 

2 
Sgn  {iii^SAl}  =  sgn  {9e^k)}  ,  so  that  (2.1-2)  can  be 
i  i 

derived  from  either  error  magnitude  or  mean  squared  error. 
Large  correction  steps  result  in  fast  convergence  to  the  steady- 
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state  near  the  optimal  filter  weights  with  a  large  steady- 
state  jitter  around  the  optimal  filter.   These  contradicting 
effects  call  for  engineering  compromise  in  choosing  the  size 
of  the  correction  step  u  . 

EL 

(b)   The  second  approach  is  to  use  a  variable  size 
step.   A  natural  possibility  is  to  consider 


ya  =  y'  |e(k) |  (2.1-3) 


The  combination  of  (1.2-2),  (2.1-1),  and  (2.1-3)  gives: 


ai(k+l)  =  ai(k)-u' | e (k) | Sgn  {e (k) }Sgn{x (k-i) }     (2.1-4) 


We  can  use  the  regular  adaptation  gain  symbol  y   instead  of 

a 

y'  and  write 


a. (k+1)  =  a.  (k)-y  e(k)  Sgn  {x(k-i)}  (2.1-5) 

1  X  ci 


(2.1-5)  is  the  simplified  LMS  (SLMS)  algorithm  with  the 
gradient  estimate  given  by: 


VgLMS(i,k)  =  e(k)  Sgn  {x(k-i)}  (2.1-6) 


36 


Typical  operation  of  the  LMS ,  FSC  and  SLMS  algorithms  are 
presented  in  Fig.  1.4-1. 

A  useful  graphic  presentation  of  adaptive  filter  proper- 
ties is  given  by  a  plot  of  processing  gain  (PG)  as  a  function 
of  convergence  time  (TC) .   The  processing  gain  measures  the 
filter  effectiveness  and  is  defined  as: 

PG  =  10  logCR."  (o)/J   ]  (2.1-7) 

b   nn     ss 

where  R   (o)  is  the  input  noise  power  and  J   ,  defined  in 
nn  r         r  ss 

(1.2-9),  is  the  output  error  power. 

The  convergence  time,  TC ,  is  the  time  required  to  reduce 

90%  of  the  initial  excess  MSE .   The  value  of  the  performance 
function  at  the  time  TC  is: 

J(TC)  =  J    +  0.1[J(o)  -  J   ]  (2.1-8) 

ss  ss 

This  plot,  named  the  Adaptive  Filter  Operating  Character- 
istic (AFOC) ,  can  be  used  for  design  when  the  number  of  filter 
weights,  N  ,  is  a  parameter.   It  also  provides  a  method  of 
comparison  for  different  adaptation  schemes .   Curves  for  the 
LMS,  FSC,  and  SLMS  algorithms  are  presented  in  Fig.  2.1-1A 
for  the  ALE  experiment  of  Appendix  A. 

We  define  the  following  complexity  index  (CF)  for  com- 
paring adaptation  schemes. 


CF  =  al  NMUL  +  a2  NADD  +  a3  NCON  (2'1"'9) 
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where  NMUL,  N.DD,  ^rQyt   are  ^e   number  of  multiplication, 
addition  and  control  operations  used  in  one  iteration.   a,  , 
a9 ,  a-  are  weighting  coefficients  representing  the  cost  of 
each  operation.   A  reasonable  approximation  which  neglects 
control  operations  is 


CF  =  a  N     +  N  (2.1-10) 

"       MUL     ADD 


Using  the  equations  for  LMS ,  FSC  and  SLMS  techniques  we  have 

the  following  complexity  indices  as  a  function  of  the  number 

of  the  filter  weights ,  N  : 

&      a 


CFTMC   =   (2N   +   1)Ct    +   2N   +  1  (2.1-11) 

LMS      a  a 


CFFSC  =  Naa  +  2Na+  X  (2.1-12) 


CFSLMS  =  (Na+  1)a  +  2Na+  1  (2.1-13) 


As  a  reasonable  numerical  example,  using  a  =  5 ,  we  have 
approximately  equal  complexity  with  NT  Mg  =  6,  N.^,,  =  11, 
N<,TMC,  =  10.   The  AFOC  comparison  for  this  complexity  is 
presented  in  Fig.  2.1-1B  and  indicates  that  for  a  given 
convergence  time  the  simplified  gradient  methods  provide 
higher  processing  gain. 
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2.2   CONVERGENCE  AND  STABILITY 

In  this  section  we  discuss  conditions  for  the  convergence 
and  the  stability  of  the  simplified  gradient  estimates.   A 
stable  adaptive  filter  is  one  that  converges  to  a  near 
optimal  steady-state.   We  now  define  the  convergence  ratio, 

Ci(k): 

a.  (k+1)  -  a* 

C.  (k)  =  -1 jj-i  (2.2-1) 

1      a±(k)  -  ai 

* 
where  a.  is  the  optimal  value  for  the  weight  a.. 

Following  Widrow  et  al  [1,  3,  4]  we  define  the  weight 

noise,  V.  (k)  ,  as  : 

Vi(k)  =  a±(k)  -  a*  (2.2-2) 

Combining  (2.2-1)  and  (2.2-2)  gives: 
V.  (k+1) 
Ci(k)  =  V.(k)  (2'2"3) 

From  (2.2-2)  and  (1.2-6)  we  get: 


V. (k+1)  =  V. (k)  -  y  V  .  (k)  (2.2-4) 

x  l        a  cl« 

Combining  (2.2-3)  and  (2.2-4)  we  get: 

VJk) 
ci  (k)  -  1  ~  ^  v^TkT  (2-2"5) 


i 


The  steady  state  average  convergence  ratio  is  defined  as : 

Ci  =  E{Ci(k)}  =  l-uaE{^-i^-}  (2.2-6) 

where  k  is  large  enough  for  operation  of  the  filter  to  be  in 
steady-state.   From  this  point  we  proceed  with  the  specific 
case  of  the  SLMS,  with  its  gradient  estimate  given  by  (2.1-6) 
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The  error  as  function  of  the  weight  noise  is  given  from  (2.2-2) 
and  (1.2-4)  as: 


N  -1 
a 


s  (k)  =  e  (k)  +   Z 

r       j=0     ? 


V. (k)  X  (k-j) 


(2.2-7) 


where  £r(k)  is  the  optimal  error  at  time  k  and  is  given  by; 


£  (k)  = 
r 


N  -1 
a 

I 

i=0 


a.  X(k-i)  -  r(k) 


(2.2-8) 


Inserting  (2.2-7)  to  (2.1-6)  results  in  the  equation: 


VSLMS(i,k)  =  er(k)  Sgn  (X(k-i)}   + 


N  -1 
a 

+    Z    V.(k)X(k-j)  Sgn{x(k-i)} 

j=o    3 


(2.2-9) 


Inserting    (2.2-9)    into    (2.2-6)    we   get 


£    (k)    Sgn{x(k-i)} 
ci=  1  -  ^     {E{r       V.(k) } 


V1  V.(k) 

+        E  E    {yl  (k)     •    X    (k-j)    Sgn    {x(k-i)}}} 

j=o  i 


(2.2-10) 


£   (k)  is  independent  of  x(k-i)  and  of  V. (k)  so  that 
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e    (k)    Sgn{x(k-i)}  *  Sanfxfk-in 

E{— vmn }  =  E{er(k)}E{    v  Sk)     }=0    (2-2~11> 

because  E[e   (k) ]  =  0. 
r 

To  continue  with  the  simplification  of  (2.2-10)  we  make  the 
following  assumptions : 


(a)  V. (k)  and  x(k-j)  are  uncorrelated 

1  }         (2.2-12) 

(b)  E{Vj(k)/Vi(k)}  =  1 

Assumption  (a)  is  similar  to  the  uncorrelated  input  assump- 
tion used  by  Widrow  in  [1]  and  seems  to  be  justified  by  his 
results.  Assumption  (b)  is  made  for  mathematical  convenience 
and  can  be  justified  by  the  dependence  of  the  weight  noises 
on  the  common  error  terms  and  the  uniform  statistics  of  the 
input  signal  over  the  filter  memory. 

Using  (2.2-11)  and  (2.2-12)  in  (2.2-10)  we  get: 

N  -1 
a 

C.  -  1  -  y   Z    E{x(k-j)  Sgn  Cx(k-i)]}  (2.2-13) 

j-o 

Since  x(k-j)  Sgn  {x(k-i)}  <_  |x(k-j)|  we  can  write: 

N  -1 
a 

C.  <  1  -  u    Z        E  |x(k-j) |  (2.2-14) 

For  stationary  input  signals  E{x(k-j)}  =  E{x(k)}  with  all 
values  of  j  and  we  get: 
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C.  <  1  -  Vi  N   E|x(k)  |  (2.2-15) 

1  —  3.      3. 


For  stable  operation,  as  well  as  for  convergence  to  the 
optimal  weight  values,  we  require 


C.  |  <  1  (2.2-16) 


Manipulating  (2.2-15)  and  (2.2-16)  to  obtain  the  stability 
condition  for  y   yields  for  the  SLMS  algorithm: 


0  <  ^SLMS  K    N   E|x(k)|  (2.2-17) 


To  express  (2.2-17)  as  function  of  the  input  power,  R   (o) , 
we  can  define  the  input  signal  form  factor,  F  ,  as: 


F   =  E|X(k)  I//  E{X2(k)  }  (2.2-18) 


Now  inserting  (2.2-18)  in  (2.2-17)  results  in: 


0  K    ^SLMS  K    ,2  (2.2-19) 

SLMS    N  F  iTH       TOT 
ax   xx 


For  the  LMS  algorithm  we  can  use  (2.2-6)  and  the  LMS  gradient 
estimate.   Following  the  above  derivation  and  using  the 
assumptions  of  (2.2-12)  we  get 


°  <  VLMS  <  N-R— (07  (2-2"2°) 

a.   XX 
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(2.2-20)  is  equivalent  to  equation  (32)  in  [4]  which  was 
derived  in  a  diferent  manner  but  with  similar  assumptions. 
To  derive  the  stability  region  of  the  FSC  algorithm  we 
use  (2.2-17)  and  the  relationship  between  the  FSC  and  the 
SLMS  algorithms,  we  define  an  equivalent  adaptation 
gain,  u   ,  by  the  formula 

pBPP  =  y   Ele  (k) I  (2.2-21) 

FSC     eq  i  r    i 

It  is  interesting  to  note  that  we  are  now  using  the  deriva- 
tion process  of  Section  2.1  for  the  SLMS  algorithm  in  a 
reverse  direction.   The  case  of  greatest  interest  is  that  of 
a  low  signal  to  noise  ratio.   For  this  case  we  use  the 
following  approximations : 


e  (k)  =  y(k)  -s  (k+1)  -n(k+l)  =-n(k+l)  ~-r  (k) 


and 


E|er(k) |=E|r(k) |  =E|x(k)|  (2.2-22) 


Inserting  y    from  (2.2-21)  into  the  SLMS  relation,  given  by 
(2.2-17),  with  the  use  of  (2.2-22)  results  in  the  following. 


0  <  yFSC  <  2/Na  (2.2-23) 


The  foregoing  relationships  (2.2-17),  (2.2-20)  and  (2.2-23), 
are  based  upon  average  behavior  of  the  algorithms .   In 
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practice,  to  avoid  numerical  overflow,  we  must  use  adaptation 
gain  values  much  smaller  than  the  upper  limit  indicated  in  the 
above  relations.   An  additional  consideration  that  also  results 
in  a  smaller  adaptation  gain  is  the  misadjustment .   For  all 
the  algorithms ,  the  use  of  the  upper  bound  value  for  the 
adaptation  gain  results  in  a  misadjustment  of  the  order  of  the 
optimal  filter  gain  (PF) ,  which  means  that  practically  we  are 
restricted  to  much  lower  values  of  the  adaptation  gain.   The 
results  of  this  section  are  included  in  Table  2.6-1. 
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2.3   CONVERGENCE  TIME  ESTIMATION 

In  order  to  estimate  the  convergence  time  of  an  adaptive 
filter  one  may  visualize  the  process  as  changing  the  weights 
with  some  average  step,  A,  taken  in  most  of  the  iterations 
towards  the  optimal  value  of  the  weight.   Assuming  an  initial 
value  of  zero  for  all  the  weights,  the  longest  convergence 
time  will  be  associated  with  weight  having  the  largest  abso- 
lute value,  a    .   From  the  above  it  is  reasonable  to  assume 
max 

the  following  relationship: 


TC  =  a1   -^   •  Na2  (2.3-1) 


where:   N   is  the  number  of  filter  weights,  and  a.,  a_  are 

unknown  coefficients.   a   /A  is  the  exact  number  of  steps 

max 

needed  for  convergence  if  the  correction  is  always  in  the 
right  direction.   In  practice  the  gradient  estimation  causes 
errors  in  the  direction,  and  the  number  of  iterations  re- 
quired to  converge  to  the  optimal  value  of  the  weights  is 
modified  by  a  factor  that  depends  in  some  non-linear  way  on 

the  number  of  weights  N  .   This  modification  is  represented 

a2 
in  (2.3-1)  by  the  factor  a-^  n   .   Also  a-,  depends  on  the 

exact  definition  of  TC  (i.e.  10%  or  e   of  the  initial  error 
squared) .   Filter  operation  involves  a  linear  combination  of 
input  values .   Since  the  reference  amplitude  is  independent 
of  N  ,  when  we  combine  more  input  samples  the  relative  weight 

3. 

associated  with  each  sample  should  be  smaller,  mathematically: 
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a2 

a     =  rr  (2.3-2) 

max   N 

ci 


In  general,  a.,  depends  upon  the  input  signal  to  noise  ratio 
as  discussed  in  the  literature  [3,  333.   This  dependency  is 
not  taken  into  account  in  the  derivations  which  follow  in 
order  to  simplify  comparison  of  the  new  algorithms  with 
existing  algorithms.   The  results  of  reference  [37]  can  be 
used  to  modify  the  results  presented  here  to  include  the  de- 
pendence upon  input  signal  to  noise. 

When  looking  at  specific  applications,  such  as  Adaptive 
Line  Enhancement  (ALE) ,  one  can  determine  the  value  of  cu  in 
(2.3-2)  exactly.   Inserting  (2.3-2)  to  (2.3-1)  and  absorbing 
a,  into  a  ,  we  write: 

a±   Na2 

TC  =  --jgi-  (2.3-3) 

a 

A  in  (2.3-3)  depends  on  the  adaptive  scheme.   It  is  the 

fixed  step  size  in  the  FSC  algorithm  and  an  average  step 

size  for  the  LMS  and  the  SLMS  algorithms .   Thus  for  these 
three  cases  we  define: 


AFSC  =  yFSC  (2.3-4) 


ALMS    =   E{I^LMS    VLMS     l}    =    2yLMS   E{ | £(k) X (k-i) | }  (2.3-5) 


ASLMS    =    E{^SLMSVSLM^SLMSE{l£(k)S^   x(k"i)>l>  (2-3"6> 
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Using  (2.3-3)  and  (2.3-4)  and  the  empirical  coefficients  a, = 
1.65  a„  =  1/2  as  evaluated  using  the  simulations  described  in 
Appendix  A,  we  get  the  following  FSC  convergence  time  to  10% 
of  the  initial  squared  error: 

TC  =   1'65 —  (2.3-7) 

[         yFSC^I 

where  TC  is  the  time  required  to  reduce  the  error  to  10%  as 

defined  by  (2.1-8).   Fig.  2.3-1  presents  a  verification  of 

(2.3-7)  using  simulation  results  with  several  values   of  pF„p, 

N  ,  and  the  input  power  R   (0). 

The  significance  of  these  results  is  that  they  confirm 

that  the  convergence  time  is  inversely  proportional  to  the 

adaptation  gain  and  the  square  root  of  the  number  of  weights . 

Assuming  in  (2.3-5)  that  ■ 

E{|e(k)  |  •  |x(k-i)  |  }  =  E|e(k)|-  E|x(k-i)|  we  get: 


ALMS  *  2^LMS  E'£(k)l   El*^!  (2-3"8) 


At  the  start  of  the  adaptation  process  the  initial  weights 
have  a  value  of  zero,  so  that  y(0)  =  0  and  e(0)  =  r(0).   For 
the  correlated  reference  case  the  reference  power  is  essen- 
tially the  same  as  the  input  power  and  we  have: 
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ALMS  =  2yLMS  Elx(kH  *  E|x(k)|  (2.3-10) 


Using  expression  (2.3-10)  for  the  LMS  average  step  size  in 
(2.3-3)  with  a ^   =   l/2  we  have 


TCLMS  =  aj^—, (2-3-111 

WElx(k)l>  ^a. 


In   [4]    the  classical  LMS   convergence   time   estimate    is   given 
by : 


-LMS   "  TT^fclT  <""»> 


Based  on  the  simulation  described  in  appendix  A  we  select 

aLMS  =  -555- 

Fig.  2.3-2  presents  a  comparison  of  simulation  results 
with  the  classical  convergence  time  formula,  (2.3-12),  and 
the  new  convergence  time  formula,  (2.3-11) .   This  figure 
indicates  clearly  that  the  convergence  time  depends  upon  the 
number  of  weights,  N  ,  as  developed  in  (2.3-11),  and  that 

3. 

this  formulation  is  more  accurate  than  that  of  (3.2-12)  which 
was  developed  in  reference  [4].   In  a  similar  way  (2.3-6)  and 
(2.3-3)  gives 

TC     -  ^52 (2.3-13) 

"SLMS  ■*."!»<» I 
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Fig.  2.3-3  presents  a  comparison  of  (2,3-13)  with  simula- 
tion results,  with  aqTMq  =  I*4}  based  upon  the  results  of 
simulations  described  in  Appendix  A.   The  comparison  confirms 
(2,3-13),   The  key  formulas  of  this  section  are  included  in 
Table  2.6-1, 
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2.M-   STEADY  STATE  ERROR  AND  MISADJUSTMENT 

In  order  to  evaluate  the  steady-state  error  we  start  with 
general  relationships.   First  following  [1,  4,  3]  we  define  the 
weight  noise  v.(k)  as: 


v±(k)  =  a±(k)  -  ai 


(2.4-1) 


where  a.  is  the  optimal  ith  weight 


N  -1 
a 

y(k)  =  E   a.  (k)  x(k-i)  = 

i=0   1 


N  -1 

a    * 

E    a.  x(k-i) 
i=0    1 


N  -1 

a 


+   I    v. (k)  x(k-i)     (2.4-2) 
i=0    1 


Define 


Na-1 

eg(k)  =  y(k)-  S(k)  =   E    a*(k)  x(k-i) 

i=o   1 


-  s(k) 


N  -1 
a 


+   E    v.(k)  x(k-i)  (2.4-3) 


i=0 


We  can  now  define  the  optimal  instantaneous  error: 


N  -1 
*        a     * 
e_(k)  =  E   a.  x(k-i)  -  s(k) 
9       i=0    1 


(2.4-4) 


Using  this  value  the  minimum  mean  squared  error  is 
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*    2 

J  .   =  E  {e  (k)  },  with  k  in  the  steady  state.    (2.4-5) 
min       s 


From  (2.4-3)  and  (2.4-4)  it  follows  that: 


J   =  J(k) I  k  in  the  steady  state 
ss       '  J 


Na-1 

*  E{{£c(k)  +  z  v.  (k)  x(k-i)  }2} 

i=0     1 


-  N  -1 

*2  a  * 

=  E{e   (k) }  +  E{2   Z   v.(k)  e  (k)  x(k-i)} 

s  i=0    1     s 


N  -1  N  -1 
a      a 

+E  {   S      Z    v.  (k)  v.   (k)x(k-i,)x(k-i  ) } 

if»0  i2=0    11     12  x      * 


(2.4-6) 


The  foregoing  assumes: 

(a)  The  expectation  of  v. (k)  x(k-i)  is  factorable. 

(b)  E{vi(k)vj(k)}  =  v2  S-±j 

1   i  =  j 
where  S--    =   { 

3  0    i  ?   j 

Assumptions  (a)  and  (b)  appear  to  be  well  justified  in  the 
case  of  a  correlated  reference  signal,   as   confirmed  by  the 
agreement  obtained  between  the  derived  formulas  and  the 
simulation  results. 


55 


The  second  term  of  (2.4-6)  can  be  factored: 


{v.  (k)  e*(k)  x(k-i)}  =E{v.  (k)}-  E{e* (k) -x (k-i) }  (2.4-7) 
is  i         s 


* 
However,  E{e  (k)  x(k-i)}  =  0  (because  of  the  orthogonality  of 

the  optimal  solution)  so  that  the  second  term  is  zero  and 

(2.4-6),  using  assumptions  (a)  and  (b) ,  becomes: 

N  -1 

a    ~2 
J   =  J  .   +  I        v  E{x(k-i)  x(k-i)} 
ss    nun    .  Q 


=   Jn,in  +  Na^  R   (0)  (2.4-8) 

nun    a    xx   ' 


We  now  define  the  excess  MSE  as  follows 


j  =  j   -  j  .   =  m  v  R   (0)  (2.4-9) 

e    ss    min    a    xx 


and  the  Misadjustment  as: 

J     N3v2r   t°) 
M  =  j-S-  =     j  XX (2.4-10) 

min       min 

~~2 

the  foregoing  depends  upon  J  .  ,  N  ,  v   ,  and  R   (0).   R   (0) 

3    3   *  e  mm'   a'    '      xx       xx 

depends  upon  the  statistics  of  the  input.   J  .   depends  upon 

2 

the  input  statistics  as  well  as  N..  However,  v  depends  upon 

a 

the  nature  of  the  specific  algorithm  and  will  now  be  considered 
for  the  SLMS  and  FSC  algorithms . 
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From    (2.4-1)    and    (1.2-6)    we  have: 


v.(k+l)    =  v.  (k)    -   yaV    (k)  (2.4-11) 

i  l  a    a. 


Squaring  both  sides  we  get: 


v.2(k+l)  =  v.2  (k)  +  u?  V  2  (k)-2y  v. (k)  V  (k)    (2.4-12) 

l  i         a  a .        a  i     a  • 


For  the  SLMS  algorithm: 


v,  (k)  V  (k)  =  v.  (k)  e  (k)  Sgnlx(k-i)}  = 

X        a.  •  1        i 

N  -1 
*       a 


=  v.(k){e  (k)  +  Z        v.(k)x(k-j)}  Sgn  {x(k-i)}  (2.4-13) 
1     r      j=0    3 


where 


er(k)  =  y(k)  -  r(k)  (2.4-14) 


and 

N  -1 

e*(k)      Z    a*  x(k-i)  -  r(k)  (2.4-15) 

r       i=0    1 

e  (k)  and  e  (k)  depend  on  the  reference  signal  r(k),  and  in 

many  cases,  including  the  correlated  reference  case,  those 

* 
errors  and  the  previously  defined  e  (k)  and  e  (k)  have 

S  5 


completely  different   statistics 
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* 
v. (k)  and  x(k-i)  are  independent  of  e  (k)  so  we  have 


E  {v.(k)e*(k)  Sgn  {x(k-i)}}  = 


E  {e*(k)}  •  E(v. (k)  Sgn  {x(k-i)}}   =  0  (2.4-16) 

r  1 


we  get: 


N  -1 
a 


E{v. (k)  V  (k) }  =    S    E{v. (k)v. (k)x(k-j ) Sgn{x (k-i) }} 
1     ai        j=0      x    3 


(2.4-17) 


Using  assumptions  (a)  and  (b)  in  (2.4-17)  we  have: 


E[v. (k)V  (k) ]  =  v2  E|x(k) |  (2.4-18) 

i    a± 


taking  the  expectation  of  (2.4-12)  in  the  steady  state  and 
using  (2.4-18)  we  have: 


E[v2(k+l)]=E[v2(k)]  +  y2E{e2(k)Sgn2Cx(k-i)]} 

X  X  3.  3. 


-  2u  v   E|x(k) 

3. 


(2.4-19) 


In  the  steady  state  E[v.(k+1)]  =  ECvT (k) ]  and 

! 

from  (2.4-19)  we  can  express  v  as: 

-j   ME    [e2(k)] 

v      =  -2 ± (2.4-20) 

2  Elx(k) I 
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Now  we  can  insert  (2.4-20)  to  (2.4-10)  and  get: 

u  N  ELeJ(k)]  R   (0) 

M  =  -" — (2.4-21) 

2  J  .   E|x(k) | 
mxn 


2 

E[t  (k) ]  depends  upon  the  type  of  reference  used.   For  the 

correlated  reference  we  have: 


e  (k)  =  y(k)  -  r(k)  =  y(k)  -  Cs(k+1)  +  n(k+l)]    (2.4-22) 


By  squaring  and  taking  the  expectation  of  (2.4-22)  we  get: 


E[e2(k)]  =  E{[y(k)  -  s(k+l)]2}  +  E[n2(k+1)] 


-  2E  {y(k)  n(k+l)  -  s  (k+1)  n(k+l)}    (2.4-23) 

In  the  third  term  of  (2.4-23),  n(k+l)  is  independent  of  s (k+1) , 
and  the  present  output  y(k)  is  independent  of  the  future  noise 
n(k+l) ,  so  this  term's  expectation  is  zero.   Because  of  the 
correlation  of  s (k+1)  and  s (k) ,  which  is  a  basic  requirement 
for  the  use  of  the  correlated  reference,  the  first  term  of 
(2.4-23)  will  be: 


E{[y(k)-s(k+l)]2}  *  E{[y(k)-s(k)]2}=  E[e2(k)]=  Js£, 


(2.4-24) 
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For  stationary  noise  we  have 


R   (0)  =  E[n2(k)]  =  E[n2(k+1) ]  (2.4-25) 

nn 


Using  (2.4-24)  and  (2.4-25)  and  the  above  reasoning  about  the 
3rd  term,  (2.4-23)  becomes: 


E  ul(k)l      =  J    +  R   (0)  (2.4-26) 

r         ss    nn 


For  reasonable  processing  gain  J   <<R   (0)  and 

ss    nn 


E  [e2(k)]   =  R   (0)  (2.4-27) 

r         nn 


The  optimal  processing  factor  (PF)  of  a  filter  is  defines  as 


R   (0) 

PF  =  -SS (2.4-28) 

min 


PF  express  the  optimal  noise  reduction  possible  by  an  optimal 

filter  of  order  N  .   PF  depends  upon  N  and  the  signal  sta- 

a  a 

tistics,  and  does  not  depend  upon  the  adaptation  algorithm 
and  the  adaptation  gain. 

Inserting  (2.4-27)  into  (2.4-21)  we  get: 


y  N  R   (0)  R   (0) 

m  =   a  a  nn ,  xx (2    4-29) 

2  E|x(k)  |  J  .  (^-q    ^' 

mm 


Using  definition  (2.4-28)  in  (2.4-29)  we  get: 
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y  N  R   (0) 
M  =   2  E|x*k)l  *  PF  (2-4"3°) 


Fig.  2. 1-1  presents  a  verification  of  (2,1-30),  using  simula- 
tion results  with  several  values  of  UqTMoj  N  ,  and  the  input 
power  R   ( 0 )  , 

Equation  (2.1-3  0)  was  derived  for  the  SLMS  algorithm.   To 
derive  an  equivalent  expression  for  the  FSC  algorithm,  we  use 
(2.1-30)  and  the  relationship  between  the  FSC  and  the  SLMS 
algorithms  given  by  (2.2-21).   The  case  of  greatest  interest 
is  that  of  a  low  signal  to  noise  ratio,  for  this  case  we  can 

use  the  approximation  given  by  (2.2-22).   Inserting  y    from 

eq 

(2.2-21)  into  the  SLMS  relation,  given  by  (2.1-30),  with  the 
use  of  (2.2-22),  results  in  the  following: 


UFSCNaRxx(0) 
M  =   *bC"  a  xx      PF  (2.4-31) 

.2[E|x(k)|r 


Equation  (2.1-31)  provides  an  estimate  of  the  misadjustment 
of  the  FSC  algorithm  and  Fig.  2.1-2  illustrates  it's  agree- 
ment with  the  simulations.  It  should  be  noted  that  because 
of  the  approximation  of  (2.1-31),  the  accuracy  of  (2.2-22),  and 
the  accuracy  of  (2.1-31)  should  improve  for  lower  signal  to 
noise  ratios , 
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A.  SU1S  HISADJUSTNENT  usu,  ,n-10  ,RXX<0)-1 


0.990  0.901  3.902  9.993  9.994  9. 90S  9.996  9.99?  9.998  9.999  9.910 

a 

B.   SLMS  II  US  N  ,U    «.005   ,RXX(0>-1 
_t _£ , L. 


300  + 


N200- 


ft 


9(190. 


c.  suns  n  us  rxxoj  ,n«i9  »u_».«ts 

J J 1 I - I -3 1 L. 


T 
50 

RXXC0) 
LEGEND   t 
SIMULATION  .AVERAGE  OF  100  RUNS 
THCORV  ESTIMATION 


100 


Fig.  2.4-1 
SLMS  Misadjustment,  Simulation  Results  and  Theory 
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A.   FSC  MISADJUSTPIEHT  US  U      .N-10   ,RXXC0)-1 
J 1 1 S_i 1 i_ 


n 1 1 1 1 1 1 1 r 

e.eoe  0.901  9.002  9.003  0.994  9.00s  8.3$s  9.907  e.oas  9.009  9.919 


8.  FSC  II  US  N  ,H,-.90S  ,RXXC0>«1 
-J C L 


C.  FSC  H   US  RXX(O)  ,N»19  .Va..005 
_i 1 1 1 1 1 u 


-» 1 r 

40    50    60 
RXXC9) 

LEGEND  « 
SIMULATION  .AVERAGE  OF  100  RUHS 
THEORY  ESTIMATION 


100 


Fig.  2.4-2 
FSC  Mis adjustment,  Simulation  Results  And  Theory 


63 


2.5   DESIGN  CONSIDERATIONS 

The  design  problem  of  a  FIR  adaptive  filter  involves  the 
following  major  points: 

(a)  Selection  of  an  algorithm:   LMS ,  FSC,  SLMS. 

(b)  Determination  of  the  order  of  the  filter,  N  . 

(c)  Determination  of  the  adaptation  gain,  y  . 

The  discussion  which  follows  does  not  consider  the  following 
points  : 

(a)  A  possible  IIR  filter  solution. 

(b)  Implementation  details. 

(c)  Minimization  of  a  given  design  criteria,  such 
as:   cost,  volume,  weight  ...  etc. 

The  adaptive  filter  is  usually  part  of  a  larger  system  which 
sets  its  design  requirements.   The  adaptive  filter  specifica- 
tions that  we  consider  here  are:   a  desired  processing  gain 
and  an  upper  limit  to  the  convergence  time.   The  additional 
information  required  for  the  design  is  some  specification  of 
the  expected  input  signal  to  the  adaptive  filter.   Realizing 
that  a  complete  analysis  of  adaptive  filter  behavior  is  not 
possible  for  complicated  signals,  we  consider  a  design  proce- 
dure based  upon  simulation  and  a  graphic  presentation  of  the 
adaptive  filter  properties,  the  adaptive  filter  operating 
characteristic  (AFOC)  as  defined  in  section  2.1.   As  an 
example,  we  considered  enhancement  of  a  single  sine  wave  of 
unknown  frequency  with  a  signal  to  white  noise  ratio  of  0  dB . 
The  desired  processing  gain  is  8  dB  and  the  allowed  convergence 
time  is  100  iterations. 
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Table  2.5-1  outlines  the  suggested  design  procedure  and 
presents  the  application  of  this  procedure  to  the  foregoing 
example.   Since  we  do  not  specify  implementation  details, 
step  (6)  of  the  procedure  cannot  be  carried  out  for  the 
example.   Hence  the  example  is  done  for  the  SLMS  algorithm 
only . 
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Table  2.5-1 


FIR  Adaptive  Filter  Design  Procedure 


Step 

Description 

Example 

1 

Test 

Signal 

Selection 

Define  a  test  signal  (or 
several  test  signals)  for 
which  the  filter  performance 
will  be  evaluated.   The  dyna- 
mic range  of  the  input  signal 
should  be  considered  at  this 
point,  and  might  influence 
the  selection  of  the  test 
signals.   The  test  signal 
might  be  average,  worst  case, 
or  several  typical  signals. 

Sine  wave  plus  white 
noise  with  signal  to 
noise  ratio  of  0  dB. 

2 

Simula- 
tion 

Use  simulation  to  generate  an 
Adaptive  Filter  Operating 
Characteristic  (AFOC)  with 
N  as  a  parameter  for  the 
L^S,  FSC,  SLMS  algorithms 
for  each  of  the  test  signals 
selected  in  step  1. 

Fig.  2.5-1 

3 

Determin- 
ation of 
N  for 
etch 
test 
signal 

For  each  of  the  algorithms 
draw,  on  the  AFOC  plots, 
lines  for  the  desired  pro- 
cessing gain  and  conver- 
gence time.   For  each  al- 
gorithm select  the  small- 
est number  of  weights  that 
meets  the  requirements . 
At  this  point  the  designer 
might  consider  trade-off 

in  N  ,  PG,  TC. 
a 

In  Fig.  2.5—1  we 

select  N  =14 
a 

4 

Determin- 
ation of 
u   for 
etch 
test 
signal 

Since  each  curve  on  the 
AFOC  is  constructed  for 
several  values  of  y  ,  one 
can  use  this  data  and  the 
values  of  N  ,  TC  and  PG 
selected  in  step  3  to 
determine  the  appropriate 
value  for  the  adaptation 
gain. 

Fig.  2.5-2  we  select 

U  =.004. 

a 

A 
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Table   2.5-1   continuted 


Step 

Description 

Example 

5 

Select 
optimal 
para- 
meters 
for  each 
algorithm 

Using  the  information  from  all 
the  test  signals  we  ne^ed  to:^ 

(1)  Select  a  single  N  and  y 
for  each  algorithm. 

(2)  For  the  three  algorithms 

evaluate  the  performance  with 

this  N*.  y*  for  each  test 

n  a   a 
signal. 

(3)  Determine  for  the  LMS  and 
SLMS  algorithms  whether  ad- 
justment for  dynamic  range  is 
required. 

Fig.  2.5-3  we  have 
PG=7.9  dB,  TC=114. 
This  performance  is 
marginal  and  a 
higher  order  filter 
should  be  considered 

6 

Selection 
of 
algorithm 

■ 

At  this  point  in  order  to 
complete  the  specification  of 
the  filter  the  designer  can 
compare  the  resulting  com- 
plexity of  the  three  candi- 
dates and  select  the  best  one. 
The  decision  depends  upon 
implementation  details. 

. 
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N   =35 

a 


/ 


N   =20 
a 


N  =15 
a 


required   processing   gain 


M  =10 
.  a 


7- 


6- 


5- 


4- 


required  convergence  time 


Na=5 


— 1 1 1 1 1 1    i r 1 1 

«  iee  see  3ee  «ee  s»«  see  ?ee  sw  see  i*ee  nee  isee 

TC 


H-5 

M-ie 

H-15 
H«20 
M-3S 


LEGEND  « 


Fig.  2.5-1 

AFOC  For  The  SLMS  Algorithm,  With  Details  Of  The  Design  Ex- 
ample.  Input  Signal  Is  A  Sine  Wave  Plus  White  Noise  With 

SNR=  0  dB. 
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CONVERGENCE  TIKE  US  ADAPTATION  GAIN 
J I 1 i 1 i i_ 


JgM- 


-i 1 1 1 1 1 1 1 r 

9.999  9.891  9.992  9.993  9.99-4  9.99S  9.se6  9.99?  9.993  9.099  9.919 


LEGEND  t 
SLNS  .AVERAGE  OF  199  RUNS 


Fig.  2.5-2 

Convergence  Time  As  Function  Of  Adaptation  Gain  For  SLMS  With 

N  =14.   For  TC=100  We  Select  y  =.004 
a  "a 
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Fig.  2.5-3 

Typical  Operation  Of  The  Filter  With  The  Selected 

Parameters . 
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2.6   CONCLUSION 

The  simplified  gradient  estimation  algorithms,  FSC  and 
SLMS,  have  processing  gain  and  convergence  time  similar  to 
the  classical  LMS  algorithm  as  shown  in  section  2.1.  This 
similarity  of  performance  has  been  confirmed  when  all  the 
filters  which  were  compared  have  the  same  order,  N  .   Thus, 
when  one  considers  the  implementation  savings  of  the  simpli- 
fied algorithms,  the  comparison  favors  the  simplified  ver- 
sions . 

Analytical  comparison  of  the  algorithms  is  possible  using 
the  results  in  sections  2.2,  2.3,  and  2.4,  which  were  developed 
for  the  adaptive  line  enhancer. 

A  summary  of  these  properties  is  presented  in  Table  2.6-1 
and  compared  with  the  LMS  algorithm  properties  taken  from 
C4]. 


Since  E|x(k)|=  k  /R (0) ,  it  is  clear  from  Table  2.6-1 

XX 

that  the  dynamic  range  of  the  FSC  algorithm  is  the  best, 

because  M  and  TC  are  not  functions  of  R   (0) .   The  LMS 

xx 

algorithm  has  the  poorest  dynamic  range,  since  M  and  TC 
depend  on  R   (0) .   The  SLMS  algorithm  is  in  the  middle 

XX 

since  M  and  TC  depend  on  the  square  root  of  R   (0) .   Fig. 
2.6-1  presents  the  dynamic  range  properties  of  these 
algorithms.   Finally  a  sistematic  approach  to  efficient 
adaptive  filter  design  has  been  outlined. 
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Table  2.6-1 


Summary  of  Algorithms  Properties 
Derived  for  ALE  Example 


Misadjustment-M 


Convergence  Time-TC 


Stability  Limit 


LMS 


Cu  N  R   (0) ]PF 
Ka  a  xx 


(see  note  2) 


lnlO 


4y  R   (0) 
a  xx 

(Eq.  2.3-12, 
see  note  1) 


1/[N  R   (0)] 
a  xx 


(Eq.  2.2-20) 


FSC 


Vi  N  R   (0) 
r  a  a  xx 

2(E|x(k)  |)2 

(Eg.  2.4-31) 


-]PF 


1.65 

]X   SW 
a   a 

(Eg.  2.3-7) 


2/N 
a 

(Eg.  2.2-23) 


SLMS 


y  N  R   (0) 
C2e|x"(10| ]PF 

(Eq.    2.4-30) 


1.4 


ll    vft~~E|x(k) 
a      a    ' 

(  Eg.  2.3-14  ,  with 

a 


SLMS 


=  1.4) 


2/EN   E|x(k) |] 

Cl 


(Eg.  2.2-17) 


PF  = 


R   (0) 
nn 

J  . 

min 


Notes : 
(1)   A  new  LMS  convergence  time  estimate/  using  (2.3-11)  with 


a 


Ljyjg   =    .555,    is   given  by: 
0.555 


TC  = 


a 


ya(E|x(k)|^^r  "  UaRxx(0)  ^r 


(2)  The  LMS  relationship  taken  from  [4]  with  modification 
for  the  ALE  example.  This  modification  is  similar  to 
the  derivation  of  (2.4-30). 
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Fig.  2.6-1 

Dynamic  range  of  the  algorithms  N  =10,  ULMS  =  .0025 
U_cr,  =  uOTMO  =  '005,  ALE  simulation  results  average  of 
100  runs. 
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III.   RANDOM  SEARCH  II R  ADAPTIVE  FILTERS 

3.1   IIR  PERFORMANCE  SURFACE 

The  performance  surface  of  the  IIR  filter  is  much  more 
complicated  than  the  FIR  performance  surface  because  of  the 
feedback  of  previous  output  values  used  to  form  the  present 
output . 

Consider  the  filter: 

N  -1  N, 

a  b 

y(k)  =  I        a.x(k-i)  +   Z   b.  y(k-i)  (3.1-1) 

i=0    1         i=l   1 


and  the  error  e  (k)  =  y(k)  -  s(k);  .  The  MSE  performance 

surface  is  given  by: 

N  -1 
2  a 


J({a,},  {b,  })  =  E[e„(k)]  =  E  {[   Z   a.x(k-i) 

N. 


i=0   1 


b  2 

+   E   b.y(k-i)  -  s(k)]  }  (3.1-2) 

i=l   x 

In  (3.1-2)  k  is  large  enough  for  operation  of  the  filter  to 
be  in  steady-state. 

Manipulating  (3.1-2)  we  get: 

N  -1  N  -1 
a    a 

J({a.},  {b.})  =   Z     Z    a.  a.  R   (i-j)  (3.1-3) 

1     !     i=o   j=0     i   J   xx 

b   b  b 

+  Z    Z    b.  b.  R   (i-j)  +  R   (0)-2  Z   b.  R   (j) 
i=l  j=l    1   3   yy         ss     j=1   3   sy 

N  -1  N,  N  -1 

a     b  a 

+  2   Z     Z   a.  b.  R    (i-j)-2   Z    a.R   (i) 
i=0   j-1   x   3   xy  .=0    i  sx 
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where  the  correlation  functions  are: 
R   (m)  =  E[x(k)  x(k-m)  ] 

R   (m)  =  E[y(k)  y(k-m)  ] 

R   (m)  =  E[s(k)  x(k-m)  ] 
sx 

R   (m)  =  E[s(k)  y(k-m)  ] 
sy 

R   (m)  =  E[x(k)  y(k-m)  ] 

R   (0)  =  E[s(k)  s(k)  ] 
ss 

Equation  (3.1-3)  apoears  to  be  quadratic  in  the  weights,  but 
actually  R   (m) ,  R  „(m)  and  R   (m)  also  depend  on  the  weights 

yy     xy       sy        r  3 

The  dependence  of  the  performance  surface  (3.1-3)  on  the 
weights  is  of  high  order,  and  the  surface  has  several  minima, 
only  one  of  which  is  the  global  minimum.   To  demonstrate  the 
complexity  of  this  performance  surface  consider  the  simple 
filter: 

y(k)  =  ax(k)  +  by(k-l)  (3.1-4) 

One  can  recursively  insert  successive  expressions  for  y(k-i). 
Thus  : 

y(k)  =  ax(k)  +b[ax(k-l)  +b[ax(k-2)  +b[ax(k-3)  +  

(3.1-5) 


I 
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or  in  a  compact  form: 


k 
y(k)  =  a   Z  b1  x(k-i)  (3.1-6) 

i=0 


J(a,b)  =  E{[y(k)-s(k)  J2} 


=  a2   I  Z     bV  ECx(k-i)x(k-j)3 

i=0   j=0 

2  k    i 

+  ECs  (k)  ]  -  2  a    £   b  E[x(k-i)  s  (k) ] 

i=0 

k   k 
=  'a2    E    E    bi+j  R   (j-i)  +  R   (0) 
i=0  j=0        xx         ss 

k 
-  2a    E   b1  R   (i)  (3.1-7) 

i-0 


Since  R   (m)  and  R   (m)  do  not  depend  on  the  values  of  the 
weights  a,b  the  degree  of  the  performance  surface  given  by 
(3.1-7)  is  already  2k  for  b  and  quadratic  in  a.   When  the 
filter  operates  for  a  long  time,  k  ■+  °°  and  (3.1-7)  is  an 
infinite  sum  and  an  infinite  degree  polynomial. 

Elliott,  Jacklin  and  Stearns,  [25],  presents  an  expres- 
sion for  the  performance  surface  which  is  derived  for  the 
general  case,  with  N   forward  weights  and  N,  backwards  weights, 
this  expression  is  similar  to  (3.1-7). 

The  general  case  also  has  quadratic  depencence  on  the  a's 
and  infinite  polynomial  dependence  on  the  b's.   Thus  the 
use  of  gradient  search  methods  to  optimize  multimodal 
performance  surface  can  be  expected  to  result  in  a  steady 
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state  around  one  of  the  minima  point  which  is  not  necessarily 
the  desired  global  minimum.   The  steady-state  minimum  point 
depends  upon  the  initialization  of  the  adaptive  filter.   This 
behavior  has  been  demonstrated  for  the  Stearns'  algorithm 
by  Parikh  and  Ahmed.  [13] 
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3.2   THE  RANDOM  SEARCH  CONCEPT 

A  random  search  method  consists  of  evaluation  of  esti- 
mates of  the  performance  surface  at  discrete  sets  of  the 
filter  weights .   After  evaluation  of  these  performance  surface 
estimates,  a  comparison  is  made  and  a  minimum  point  is 
selected. 

The  method  of  selection  of  filter  weights  for  which  the 
performance  surface  estimate  is  to  be  evaluated,  is  very 
important.   In  order  to  have  a  useful  adaptation  scheme  for 
non-stationary  input  signals  a  continuous  search  method  is 
needed,  in  contrast  to  possible  two  phase  method  that  has  a 
global  search  phase  and  then  fine  tuning.   From  the  several 
methods  in  the  literature  [19,  20,  21,  23,  30,  31,  33,  36] 
the  needed  continuity  of  operation  is  provided  by  the  moving 
center  method . 

The  center  is  the  point  in  the  parameter  space  {W. }  with 
the  lowest  estimate  of  the  performance  function  among  the 
points  that  have  been  tested  so  far.   The  set  of  filter 
weights  (or  in  general,  system  parameters)  to  be  tested,  at 
the  Jlth  random  search  interval,  tW.}-,  is  given  by: 

W.  0  =  W.  „  +  yg  for  all  i         (3.2-1) 

where : 

i  is  the  parameter  index 

{W.K  is  the  value  of  the  center  at  the  £th  random  search 
interval,  W.  „  is  its  ith  element 
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g  is  a  number  independently  generated  for  each  weight  from 

a  gaussian  random  number  generator,  with  zero  mean  and 

unity  variance 

y>0  determines  the  range  covered  by  one  step  of  the  search 

and  is  taken  to  be  the  same  for  all  weights  which  need 

not  be  the  case 

{W.K  is  a  set  of  randomly  selected  parameter  values 

around  the  center  point  {W.K,  during  the  itth  random 

search  evaluation  interval.   W.  „  is  its  ith  element. 

The  test  point  (the  set  {W.K)  is  tested;  that  is  the 

l  *  * 

value  of  the  performance  surface,  J. ,  is  estimated  as  J. ,  and 

A, 

compared  to  the  current  center  estimated  value,  J. . 

A  «w 

If  J.  <  J.  ,  a  new  point  in  parameter  space  is  selected 

X/   ~"-      A/  S\ 

using  equation  (3.2-1) .   If  J.  <  J0  the  test  point  corresponds 
to  a  lower  performance  surface  value  estimate,  and  the  center 
moves  to  a  new  location;  that  is  we  set  W.  . , ,  =  W.  „  for  all 
i.   Now  (3.2-1)  is  used  again  to  evaluate  another  point  to  be 
tested. 

Fig.  3.2-1  presents  a  two  parameter  example  of  a  moving 
center  random  search  process.   It  should  be  noted  that  J„  and 

A  A/ 

J.  are  only  an  estimate  of  the  performance  surface  points  J„ 
and  J.  because  the  latter  properly  involves  averaging  over  an 
infinite  ensemble. 

In  order  to  use  the  random  search  method  in  adaptive 
filters  we  need  to  specify  the  performance  function  and  to 
define  some  estimate  of  that  function.   Since  we  are  comparing 
the  performance  function  and  not  evaluating  it's  gradient  one 
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can  select  a  complex  non-analytic  performance  function.   This 
possibility  might  be  used  to  great  advantage.   However,  in 
the  following  discussion  we  use  the  standard  criteria  of 
minimum  mean  squared  error . 

To  evaluate  an  estimate  to  the  performance  surface  we 
use  two  filters  in  parallel.   One  uses  a  set  of  weights  that 
are  the  current  center,  so  that  this  filter  produces  the 
output,  y(k) .   The  second  set  of  filter  weights  correspond 
to  a  test  point  in  the  weight  space.   The  filter  output  at 
the  test  point,  y(k),  is  used  only  during  the  adaptation 
process . 

The  performance  surface  estimates  are  a  time  average,  which 
is  the  only  reasonable  estimate  of  the  ensemble  average  that 
we  can  calculate  on  line,  and  are  given  by: 


R-l 
J£  -  I        Cy(k-j)  -  r(k-j)] 


j-o 


R-l 


•  v  -,2 

(3.2-2) 


Ji   =  jf0  Cy(k_j)  "  r<k-:i>l2  (3.2-3) 


where  R  is  the  number  of  input  samples  used  to  estimate  the 
performance  function  for  a  given  random  search  interval. 

We  have  two  types  of  iterations.   First  filter  iterations 
which  process  each  new  input  sample  with  a  fix  set  of  filter 
weights  and  produce  the  outputs  y (k)  and  y(k).   The  second 
type  of  iteration  involves  the  random  search  selection  of  a 
new  set  of  filter  parameters  which  occurs  after  R  filter 
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iterations .   For  each  random  search  interval  a  new  set  of 
parameters  is  selected  and  tested.   Fig.  3.2-2  presents  the 
relationship  of  filter  iterations  to  the  random  search  in- 
terval. 

Fig.  3.2-3  presents  a  flow  chart  of  the  basic  random 
search  adaptive  filter  algorithm. 
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3.3   OPERATION  OF  THE  RANDOM  SEARCH  ALGORITHM 

We  now  use  the  random  search  algorithm  presented  in 
Fig.  3.2-3  to  implement  an  adaptive  IIR  filter.   The  filter 
operation  is  given  by : 

N  -1  N. 

a  b 

y(k)  =   S   a.(k)X(k-i)  +   £   b.(k)y(k-i)        (3.3-1) 

i=0   1  i=l   1 

The  weights  {a. (k) }  and  {b . (k) }  are  functions  of  time  and 

their  variation  is  controlled  by  the  algorithm  of  Fig.  3.2-3 

and  equation  (3.2-1). 

Details  of  the  simulation  are  given  in  Appendix  A. 

Fig.  3.3-1  presents  the  operation  of  a  random  search  IIR 

filter  with  N  =3,  N,=2,  u  =.01,  u.  =.1. 
a     b     Ha       b 

We  now  discuss  these  results  starting  with  some  basic 
filtering  considerations. 

The  poles  and  zeros  of  a  filter  should  be  located  so  that  the 
desired  spectral  components  pass  through  the  filter  and  the 
unwanted  components  are  rejected. 

For  an  adaptive  filter  we  also  need  to  match  the  filter 
output  amplitude  to  the  reference  signal  amplitude,  that  is 
there  is  a  gain  factor  which  must  be  adjusted  accurately  in 
the  adaptive  filter. 

The  filter  (3.1-1)  has  the  following  transfer  function 

M 

nZ  (z-q.) 
H(z)  =  aQ  i=l i_  {33_2) 

nz  (z-Pi) 

i=l 
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where : 

q.  are  the  filter  zeros.   There  are  M  zeros. 
ni  z 

p.  are  the  filter  poles.   There  are  M  poles. 
The  parameter  a  controls  the  gain  of  the  filter .   For  the 
adaptive  filter  we  can't  use  the  concept  of  transfer  func- 
tion because  the  filter  weights  are  time  varying.   However 
we  can  consider  an  average  steady  state  transfer  function 
with  weights  that  are  the  mean  of  the  time  varying  weights . 
Thus  a   is  required  to  match  the  filter  maximum  output  magni- 
tude to  the  reference  amplitude.-,  In  order  to  enhance  a  desired 
spectral  component  the  filter  should  have  a  pole  (or  poles) 
near  the  spectral  component,  close  to  the  unit  circle.   The 
effect  of  this  pole  on  a  signal  at  the  same  frequency  would 
be  to  multiply  its  amplitude  by  a  gain  factor  of  (1/1-p) 
where  p  is  the  pole ' s  magnitude .   Thus  the  output  of  the 
filter  at  the  pole  frequency  is  given  by  a  /(1-p)  times  the 
magnitude  of  the  input  signal  multiplied  by  a  factor  which 
depends  upon  the  location  of ■  the  other  poles  and  zeros.   If 
this  output  magnitude  is  to  be  equal  to  the  reference  signal 
amplitude, a  /(1-p)  must  have  a  specific  accurate  value 
because  all  the  other  factors  that  determine  the  output 
amplitude  (namely  the  location  of  the  other  poles  and  the  zeros) 
have  only  one  optimal  value,  and  thus  for  the  steady-state 
near  optimal  filter  are  fixed. 

For  good  selectivity  p  is  only  a  little  smaller  than 
unity,  and  a-  /(1-p)  is  the  ratio  of  two  very  small  numbers. 
It  is  difficult  to  achieve  accuracy  for  this  ratio  with 
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random  search  adaptation  on  the  forward  weights  (the  a's) 
and  the  backwards  weights  (the  b's  or  the  poles) . 

As  a  result  of  the  mismatch  of  filter  output  and  refer- 
ence signal  amplitudes,  decision  mistakes  occur  when  com- 

A 

paring  the  performance  function  estimates,  J„  >   J.,  with 
the  result  that  a  bad  set  of  weights  is  sometimes  chosen. 
These  decision  mistakes  cause  slower.; convergence  and  smaller 
value  of  the  steady-state  p - (which  means  lower  processing 
gain).,  and  possible  instability.   This  type  of  behavior  was 
experienced  in  our  simulation.   One  solution  that  we  tried 
was  to  use  smaller  variance  for  the  search  on  the  feed- 
forward weights.   This  approach  turned  out  to  be  inferior 
to  a  new  approach  (which  is  presented  in  the  next  section) 
based  on  the  use  of  gradient  search  on  the  feedforward 
weights  and  random  search  on  the  feedback  weights. 
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3.4   RANDOM  AND  GRADIENT  SEARCH  (RGS) 

The  fact  that  the  IIR  filter's  performance  surface  is 
a  quadratic  function  of  the  feedforward  weights,  as  dis- 
cussed in  section  3.1,  means  that  for  non-varying  feedback 
weights  the  performance  surface  with  respect  to  the  a's  is 
unimodal .   A  unimodal  surface  may  be  handled  best  using  a 
gradient  method,  and  it  is  possible  to  achieve  any  desired 
accuracy  to  overcome  the  problem  of  the  purely  random 
search  scheme  discussed  in  section  3.3. 

Widrow  and  McCool  [5],  have  compared  a  random  search 
technique  for  a  FIR  filter  with  the  LMS  algorithm.   The 
random  search  technique  used  was  tailored  to  the  unimodal 
situation  and,    nonetheless,  resulted  in  inferior  perfor- 
mance compared  with  the  LMS  steepest  decent  gradient  search. 

The  question  is  how  to  make  the  feedback  weights  con- 
verge first,  so  that  the  feedforward  weights  would  then 
converge  to  the  global  minimum. 

The  cascaded  arrangement,  as  shown  in  Fig.  3.4-1  is 
suggested.   The  all  pole  section  comes  first,  and  is 
adaptively  controlled  by  a  random  search  algorithm.   A 
second  all  zeros  section  is  then  adaptively  controlled  by  a 
gradient  algorithm  to  produce,  with  suitable  values  of  the 
adaptation  gains,  the  desired  effect  of  pole  convergence 
followed  by  zero  convergence. 

The  optimal  values  of  the  adaptation  gains  is  a  com- 
promise of  two  contradicting  considerations.   The  first  factor 
is  the  requirement  that  the  poles  converge  faster,  and  calls 
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for  low  u  .   The  second  factor  is  a  desire  to  maintain  enough 
randomness  in  the  weight  adaptation  process  so  that  we  have 
a  reasonable  probability  of  transition  from  a  local  minimum 
zone  to  the  global  minimum  zone.   The  effect  of  the  later 
factor  depends  upon  the  specific  shape  of  the  performance 
surface. 

The  internal  signal  cf>(k)  has,  at  least  during  the  pole 
convergence,  non-stationary  characteristics.   In  particular, 
the  magnitude  variations  of  <}>(k)  are  important  to  the 
adaptation  operation  in  the  all  zero   section.   As  discussed 
in  section  2.6  the  FSC  algorithm  has  no  dynamic  range  limita- 
tions so  that  it  is  ideally  suited  for  the  RGS  algorithm. 

The  RGS  algorithm  is  presented  in  three  ways:   Fig.  3.4-2 
presents  its  flow  diagram,  Fig.  3.4-3  presents  its  block 
diagram,  and  Appendix  B  is  a  FORTRAN  realization  of  the  RGS 
algorithm.   Fig.  3.4-4  presents  typical  operation  of  this 
algorithm  and  shows  the  convergence  and  steady-state  operation 
of  the  filter.   Some  further  analysis  and  more  simulation 
results  are  included  in  the  next  section. 
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3.5   CONVERGENCE  OF  THE  RGS  IIR  FILTER 

Convergence  of  the  RGS  IIR  filter  is  composed  of  two 
processes:   the  random  search  on  the  feedback  weights,  and 
the  gradient  search  on  the  feedforward  weights .   These 
processes  are  coupled  through  the  cascade  structure  of  the 
filter,  the  common  error  expression  and  the  dependency  of 
the  feedforward  weights  on  the  poles  magnitude,  as  dis- 
cussed in  section  3.2.   In  order  to  analyze  this  situation 
we  first  assume  independent  operation  and  analyze  each  of 
the  sections  separately.   We  then  combine  the  convergence 
time  estimates  with  a  correction  factor  to  account  for  the 
fact  that  both  processes  converge  simultanously . 

Consider  first  the  analysis  of  the  random  search  algor- 
ithm used  in  the  RGS  IIR  filter  in  a  general  environment. 
In  the  simple  case  of  single  parameter,  W,  define  the 
convergence  zone: 

|W£  "  W  I  _<   AW  (3.5-1) 

where : 

W-  is  the  parameter,  W,  at  the  £th  random  search 

* 
interval.   W   is  the  optimal  value  of  W  (the  global  mini- 
mum of  the  performance  surface) 

* 
AW>0  is  the  limits  of  the  convergence  zone  around  W  . 

A  test  point  is  selected  by: 

W£=  W£  +  y  g  (3.5-2) 
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where : 

W.  is  the  test  point  in  the  random  search's  £th  interval. 

y>o  is.  convergence  control  parameter  adaptation  gain. 

g  is  a  number  generated  by  N(1,0)  random  number  generator. 
The  probability  density  function  of  W  given  W»  is: 


-,         (W  -w  ) 

P<VV  -  -~   exp[-    l      \         ]  (3.5-3) 

/2Try  2y 


and  the  probability  of  selecting  a  test  point  in  the  conver- 
gence zone  is : 

*  W  +AW 

P£  =  Pr[|W  -  W  |  <   AW/W£]  =   /      P(W£/WA)dW£     (3.5-4) 

W  -AW 

Fig.  3.5-1  illustrates  the  situation  and  the  probabilities 
defined  above. 

Selecting  a  correct  value  for  W.  is  not  enough.   After  the 
testing  of  this  point  we  need  a  correct  decision  that  the 
tested  point  is  better  than  W0 .   Thus  we  have  the  estimates 
J,  and  J.,  and  the  selection  of  the  correct  weight  depends 
upon  their  comparison.   The  probability  of  a  correct  decision 
depends  upon  the  values  of  J.  and  J  ,  their  difference,  and  the 
estimation  parameters,  mainly  R.   To  simplify  the  analysis  we 
define  P  _  ( il )  as  the  probability  of  a  correct  decision  given 
W.,  averaged  over  all  possible  values  of  W, . 

We  can  write  the  probability  of  convergence  to  the  con- 
vergence zone  (3.5-1),  9.  ,  at  the  (£+1)  random  search  interval, 
given  W„ ,  as: 
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.-selection  of  W£   in        -j.p    [-correct  deci 
L+l  r   the   convergence   zone  r   namely  W£+1- 


correct  decision-,   _ 

W 
A 


=  P4    '    PCD(A)  (3.5-5) 


The  probability,  Q . ,  that  the  process  does  not  converge  in 
the  first  I   iterations  is  given  by: 


no  convergence      no  conv.  in 

Q  =  P   [in  the  ls£    ]  P   [  ]   ... 

I  r   R.S.  interval    r   sec.  inter. 


no  conv.  xn      I 
•'Pr  [£th  inter.  ]  =   ^  [1-9i3  (3-5"6) 


The  probability,  P.,  that  the  process  does  converge  in  the 
first  I   iterations  is  given  by: 


I 

PA  =  1_QJl  =  1  "   n   [1"9i^  (3.5-7) 

i=l 


So  far  we  have  discussed  the  single  parameter  case.   It  is 
now  convenient  to  introduce  the  general  case,  namely  M 
parameters.   We  can  define  the  multidimensional  convergence 
zone  as  given  by: 

US*  -  W*||  _<   AW  '  (3>5_8) 
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where : 

Wj,  is  a  vector  of  M  parameters,  W.  »     i=l, ... ,M. 

*  * 

W  is  a  vector  of  optimal  values,  W.     i=l,...,M. 

AW  is  a  vector  of  deviations  from  the  optimal  values  of 

the  parameters  defining  the  convergence  zone . 
I  j     | |  is  a  norm  defined  on  the  parameter  space. 
The  multidimensional  version  of  (3.5-2)  is  given  by: 


W£  =  W£  +  UG  (3.5-9) 


where : 

G  is  a  vector  of  M  independent  random  numbers  each  of  . 
which  is  N(1,0) . 

Because  of  the  independence  of  the  parameters  a  multi- 
dimensional version  of  (3.5-4)  is  given  by: 

H 

<K  =  n   P.  .  (3.5-10) 

I        i=1   l,* 

where  P.  0    is  the  single  parameter  probability  given  by 
3.5-4. 

The  probability  of  convergence  to  the  convergence  zone  at 
the  S.+1   iteration  given  W„  is  given  by: 


'1+1  ■  *l    ■    PCD«>  <3-5_11) 
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Equations  (3.5-6)  and  (3.5-7)  remain  the  same  for  the  multi- 
i 

parameter  case.   An  essential  property  of  any  optimization 
algorithm  is  it's  ability  to  converge  to  the  optimum.   We 
Will  now  prove  that  the  random  search  algorithm  used  in  the 
RGS  filter,  converges  to  the  convergence  zone  defined  in 
(3.5-8).   To  observe  this  point  we  examine  equation  (3.5-7). 
Since  (1-0.)  is  a  number  always  less  than  1,  the  multiplication 

I 

II    (1-9.)    becomes    smaller   as    I    increases.      Thus 
i=l  1 

I 
P   =lim  P„    =   1-lim    {   n      [1-9,]}    =1-0=1  (3.5-12) 

£->oo  £->-oo  i=l 


P^  =  1  means  that  after  enough  time  the  process  converger  to 
the  convergence  zone  of  the  global  minimum;  that  is,  conver- 
gence with  probability  1.   Equation  (3.5-12)  does  not  provide 
quantitative  information,  namely  an  estimate  of  the  convergence 
time.   This  problem  is  treated  later  in  this  section. 

Fig.  3.5-2  presents  the  results  of  a  parameter  identifica- 
tion experiment,  the  details  of  which  are  presented  in  Appendix 
A.   This  example  was  taken  from  reference  [11]  where  it  was 
used  to  demonstrate  how  Feintuch's  algorithm  converges  to  a 
point  on  the  performance  surface  which  is  not  a  minimum.   Ref . 
[13]  uses  the  same  example  to  demonstrate  how  Stearns1  algori- 
thm converges  to  either  a  local  or  global  minima  depending 
upon  the  initialization  point.   Our  results  show  that  the  RGS 
IIR  filter  converges  to  the  global  minimum  even  when  started 
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Fig.  3.5-2 
RGS  IIR  filter  transition  from  a  local  to  the  global  minimum  for 
_a.  parameter  identification  example.   The  figure  shows  the 
results  of  two  independent  experimental  runs .   See  Appendix  A 
for  details  of  the  simulation. 
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in  a  local  minimum  point.   Fig.  3.5-2  presents  two  experiments 
with  typical  convergence. 

The  adaptation  of  a  random  search  process  has  two  types 
of  parameter  changes,  or  steps: 

-  zone  transitions  -  where  after  the  movement,  the 
center  is  near  a  new  minimum 

-  small  steps  -  where  after  the  movement,  the  center  is 
near  the  same  minimum. 

Fig.  3.5-3  illustrates  the  two  step  types.   In  signal  filtering 

it  turns  out  that  the  performance  surface  values  at  the  local 

minima  are  typically  much  higher  than  at  the  global  minimum, 

as  shown  in  Fig.  3.5-4.   This  difference  in  the  performance 

surface  value  means  that  the  random  search  process,  when 

comparing  values  of  the  performance  surface  estimations , 

is  insensitive  to  the  local  minima.   In  terms  of  step  types, 

we  neglect  the  analytically  complex  zone  transition  steps 

and  analyze  the  situation  typical  of  signal  processing 

applications,  assuming  convergence  with  small  steps  only. 

We  can  define,  for  the  general  case  of  equation  (3.5-9), 

the  average  step  size,  S   ,  as: 

av 


S    =  ECW.  ...  -  W.  p]  (3.5-13) 

av      x,l+l  1/* 

where:   W.  .  ,  and  W.  .  are  the  ith  component  of  W^+1  and 

W.  respectively. 

Using  (3.5-2)  as  the  equation  for  each  component  of  (3.5-9) 

we  have 

Wi,£  "  Wi,£  =  yg  (3.5-14) 


102 


J  (W) 


Steady- 
state  jitter 


W 


Legend : 
•   -center  value  at  the  random  search  ith  interval 


Fig.  3.5-3 

Random  Search  Convergence, 
Zone  Transitions:   7  to  8 ,  13  to  14 
Small  Steps:   the  Other  Transitions 


io: 


frequency  response  of 
filter  with  weights  that 
produce  a  local  minimum 
MSE 


frequency  response  of 
filter  with  weights  that 
produce  the  global 
mimimum  MSE 


Desired  center  frequency 
of  the  filter. 


Fig.  3.5-4 

Global  and  Local  Minimum, 
Signal  Processing  Example 


104 


Since  in  small  step  convergence  the  direction  of  convergence 

* 
is  always  in  the  same  direction  towards  W  ,  on  the  average 

only  half  of  the  test  point  are  accepted,  that  is  those 

where  ug^_0  (or  those  with  yg<0)  .   This  assumes  no  decision 

errors  when  comparing  two  points.   Using  the  above  reasoning, 

from  (3.5-13)  and  the  definition  of  g=N(l,'0)  we  get: 


SaV   =    ^    Vig   P(g)    dg   =   y    /  _£_       e-g2/2        --£-  (3.5-15) 

o  o  /2tt  S2it 

The  above  analysis  ignores  the  coupling  of  the  M  parameters 
through  the  common  error  expression.   At  any  interval  one 
of  the  parameters  is  dominant,  that  is  it  contributes  to  the 
error  term  more  than  the  others.   As  a  result  the  correction 
of  the  value  of  this  parameter  dominates  even  though  the 
changes  in  the  values  of  the  other  parameters  may  be  in  the 
wrong  direction. 

Practically  the  value  of  a  parameter  can  be  expected  to 
jitter  around  some  value  until  this  parameter  becomes  the 
dominant  one.   Then  it's  value  would  be  corrected  (and  other 
parameters  would  jitter).   If  all  the  parameters  were  dominant 
for  equal  portions  of  the  process  the  average  step  size  would 
be  1/M  of  that  given  by  (3.5-15),  but  parameters  with  larger 
numerical  value  get  more  attention,  and  the  reduction  in  the 
average  step  size  is  given  by: 


fiP-  -  -i-  .  JL.  (3.5-16) 

aV    Ma6    /2? 

D 
S_   is  the  average  step  size  for  the  dominant,  large 
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valued,  parameter.   0 <a  <1  is  unknown  factor. 

o 

Assuming  that  there  is  a  correct  decision,  the  mean  num- 
ber of  random  search  intervals  needed  for  convergence  is 

W   /s,   '■ ,  where  W  „  is  the  value  of  the  largest  parameter; 
max'  av        max  *    r 

,  W 
TC  =  a      -j2£2E  •  R  (3.5-17) 

av 

where:  a,  is  a  proportionality  constant  that  depends  upon  the 
exact  definition  of  TC  (10%,  e   of  initial  error, 
etc. ) 
TC  in  (3.5-17)  is  given  in  filter  iterations.   We  can  combine 
(3.5-16)  and  (3.5-17)  and  include  Jli   in  a,  to  yield: 

W 

TC  =  a.  •  -2^*-  •  Mao  *  R  (3.5-18) 

1     y 


Equation  (3.5-18)  provides  a  convergence  time  estimation 
for  a  general  case  of  random  search  operation  with  the  above 
assumptions.   Let  us  turn  now  to  the  RGS  IIR  filter  and  get 
a  specific  expression  for  its'  convergence  time  estimate. 
For  the  RGS  IIR  filter  case  equation  (3.5-18),  in  terms  of 
the  algorithm  parameters  as  defined  in  section  3.3,  becomes 


b 

TC  =  a,  -S^*-  N.  ab  R  (3.5-19) 

b    1   u.    b 


To  qet  an  estimate  to  b    we  consider  the  transfer  function 
3  max 

of  the  filter  given  by  equaiton  (3.1-1): 
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N  -1  N  -1 

a      -i  a      -1 

E    a.z  E    a.z 

i=0    x  i=o   x 


H(z) jj-  (3.5-20) 

b  -i    N;s  i   ?   -9 

1-  Z   b.z  x    n  (l+2p.?.s   -pT  z   ) 

i=l   x      j-1      3  3      3 


where : 

N   is  the  number  of  2-order  sections  of  the  filter,  N  = 
s  s 

f  V2- 

p  .  is  the  magnitude  of  the  jth  pole 

?.  =  cos(2irf/fs) 

f   -  pole  frequency,  fs  -  the  sampling  frequency. 

For  a  stable  filter  p . <1  and  the  largest  possible  b    ,  is 

j  *  *  max 

given  by  additions  of  the  terms  2p.£.  which  is  the  coefficient 
of  z   when  the  multiplication  of  (3.5-20)  is  expanded 


N  N 

b     =  Es   2p.C.  <   Es  2   =  2N  .  =  2(N./2)  =  N.     (3.5-21) 
max   .  1        ^2    1   -  j=1       s      W  b 


Inserting  (3.5-21)  to  (3.5-19)  gives: 

TC.  =  a,  -2—  R  (3.5-22) 

b    l  ub 


where : 

a-  =  aQ  +  1  is  expected  to  be  in  the  range  l<ot2<2. 
Equation  (3.5-22)  estimates  the  convergence  time  of  the  feed- 
back portion  of  the  RGS  IIR  filter.   To  estimate  the  conver- 
gence time  of  the  feedforward  portion  we  can  use  the  FSC 
relation  (2.3-1).   As  discussed  in  section  3.3,  the  largest 
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value  of  a  feedforward  weight  should  be  proportional  to  (1-p) 


a™=.v  "  a,  (1-P)  (3.5-23) 

max    j 


where  p  is  the  pole's  magnitude.   In  addition  to  the  dis- 
cussion of  section  3.3  which  relates  the  gain  factor  a   to  the 

3  o 

dominant  pole  magnitude  p ,  it  is  also  noted  that  the  adapta- 
tion gain  is  also  related  to  the  pole  magnitude  because  the 
smaller  the  adaptation  gain  the  finer  the  control  of  the 
ratio  a  /(1-p)  and  the  closer  the  pole  magnitude  can  be 
adjusted  to  the  unit  circle.   Thus  we  write  the  following 
relationship: 


a5 
(1-p)  =  a4ya  ;-  (3.5-24) 

where  a.  and  a^  are  proportionality  constants.   Fig.  3.5-5 
shows  the  transient  and  steady  state  values  of  the  pole 
magnitude  for  several  values  of  y  •  The  non-linear 
relationship  between  p  and  y  ,  as  suggested  by  (3.5-24)  seems 

3. 

to  be  reasonable. 

Combining  (2.3-1),  (2.3-4),  (3.5-23),  (3.5-24)  and  value 
of  a, =1/2  in  (2.3-1)  we  get: 


a    /N         cu 

TC   =  a  -5^ — -  =  ae    u   '  /N~~  (3.5-25) 

a       ya       6   a     a 


We  now  determine  the  convergence  time  for  the  RGS  IIR  filter 
by  adding  (3.5-25)  and  (3.5-22): 
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Fig.    3.5-5 
Pole   convergence   and   steady   state  value   for   several   values 
of   y    .      Average   of    25    runs,    all    the   experiments   were   with 
Mj,=  .l   and   R=500. 
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a,,  a-/  ag,  a_  are  experimentally  evaluated  constants.   The 
values  of  a.,  and  ag  can  be  adjusted  to  compensate  for  the 
simulataneous  convergence  of  the  all  pole  and  all  zero  sections 
of  the  RGS  IIR  filter. 

A  special  difficulty  encountered  is  to  confirm  the  de- 
pendence of  the  random  search  convergence  time  upon  the 
number  of  weights,  N,  ,  in  the  random  search  process.   A 
change  in  the  number  of  feedback  weights  in  the  RGS  IIR 
filter  (with  or  without  changing  the  input  signal)  causes 
major  changes  in  the  nature  of  the  problem  to  be  solved.   For 
example  if  there  are  more  poles  than  necessary,  only  one  of 
them  needs  to  converge,  and  the  others  are  cancelled  by  the 
zeros.   Any  experiment  in  which  the  number  of  poles  is  varied, 
(with  or  without  changing  the  signal)  will  combine  the  effects 
of  changes  in  the  nature  of  the  problems  to  be  solved,  the 
effect  of  any  changes  in  the  input  signal  statistics,  as  well 
as  the  effect  of  more  parameters  upon  the  random  search  con- 
vergence time.   A  simpler  approach  is  to  construct  a  random 
search  FIR  filter  and  to  use  this  filter  to  verify  the  analysis 
of  the  random  search  process  and  the  dependence  of  its  con- 
vergence upon  the  number  of  weights . 

We  start  with  relation  (3.5-18)  using  FIR  notation.   Thus 


a        a 
m/->        „    max   XT  o  _ 
TC  =  a1   — •  Na    R  (3.5-27) 

cl 


no 


From  Treichler  [37]  we  have: 


SNR  (3.5-28) 


max   1+.5N   SNR 

3. 


where  SNR  is  the  input  signal  to  noise  ratio. 

Inserting  (3.5-28)  to  (3.5-27)  gives: 

a_ 
SNR  •  N    •  R 

TC  "  al  (1+.5N   SNR)  y  (3'5"29) 

ci  3. 

Fig.  3.5-6  presents  a  comparison  of  simulation  results  with 
convergence  estimate  (3.5-29)  for  several  values  of  N  ,  with  a 

3.  O 

and  a,  experimentally  determined  as  a  =  .4  242,  a,  =1.557. 
These  results  verify  the  ability  of  equation  (3.5-18)  to 
estimate  the  effect  of  the  number  of  parameters  on  the  con- 
vergence time  of  a  random  search  filter. 

An  important  assumption  used  in  the  convergence  analysis 
of  the  RGS  IIR  filter  was  that  the  random  search  process  does 
not  make  mistakes  in  the  comparison  of  J.  ><J..   In  order 
to  define  conditions  for  filter  operation  with  no  decision 
mistakes  we  investigate  the  effects  of  the  random  search 
interval,  R,   on  the  RGS  filter  performance.   Fig.  3.5-7 
presents  operation  with  decision  mistakes.  This  situation  is 
typical  in  operation  with  relatively  small  random  search 
interval.  In  the  example  of  Fig.  3.5-7  we  used  R  -   100. 
For  larger  values  of  the  random  search  interval,  R, 
we  have  slower  convergence  as  given  by  equation  (3.5-26)  and 
illustrated  by  the  pole  convergence  of  Fig.  3.5-8.    Fig.  3.5-8 
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Fig.  3.5-6 
Convergence  time  dependance  upon  the  number  of  weight  simula- 
tion and  theory  for  a  random  search  FIR  filter.   With  ^^=.0075 
R=300  for  SNR=-3  dB. 
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Fig.    3.5-7 
RGS    filter   operation  with  decision  mistakes.      Filter   para- 
meters   in   this    example   are:      N   =3,    N  =2,    y    =3x10       ,    y    =.1, 
R=100. 
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Fig.    3.5-8 
Effects   of  random  search   interval,    R,    on  pole   convergence 
and   pole's    steady-state  magnitude. 
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also  shows  that  when  the  random  search  interval  is  too  large 

(R  =  800  in  the  example  of  Pig.  3.5-8)  the  pole  magnitude 

is  smaller  and  the  resulting  signal  processing  gain  is  lower. 

This  unwanted  effect  is  caused  by  decision  mistakes  that  occur 

because  with  a  long  random  search  interval  the  feedforward 

weights  provides  better  match  for  the  weights  in  the  center 

filter  than  for  the  tested  filter  for  which  the  feedback 

feedforward  weights  are  only  copied  (Fig.  3.4-3).   This  effect, 

of  smaller  pole  magnitude,  for  long  random  search  interval 

depends  on  the  convergence  of  the  all  zero  section  and  hence 

upon  the  value  of  y  . 

a 

The  above  discussion  suggests  that  there  is  an  optimal 
value  for  the  random  search  interval,  which  depends  upon  the 

value  of  y  .   From  the  results  obtained  in  our  experiments 

—  6        —6 
it  seems  that  for  y   in  the  range  of  30x10  "  to  10   ,  the 

optimal  values  of  R  are  in  the  range  300  to  500  iterations. 

To  verify  the  convergence  time  estimate  of  equation  (3.5-26) 

we  present  experiments  with  several  values  of  y   (Fig.  3.5-9) 

EL 

and  several  values  of  y,  (Fig.  3.5-10);  all  of  them  with  R=500 
to  assure  that  the  operation  is  practically  free  from  decision 
mistakes . 

The  effects  of  y   as  discussed  above  are  clear  in  Fig. 

3. 


3.5-9 


(1)  The  steady-state  value  of  the  pole  magnitude  is 
closer  to  1  for  smaller  y  . 

(2)  The  convergence  rate  of  the  feedforward  weight  is 
proportional  to  y  . 
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Fig.    3.5-9 
RGS    IIR  filter   operation   for   3   values   of   \l&.      The   results   are 
simulation   average   of   32   runs  with  Ub=.l   and   R=500. 
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Fig.  3.5-10 
RGS  IIR  filter  operation  for  3  values  of  \x.  ,      Simulation 
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results  are  average  of  32  runs  with  u  =3x10    and  R=500 
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(3)   The  overall  effect  of  the  convergence  and  steady- 
state  value  of  the  output  MSE  is  a  combination  of 
(1)  and  (2) . 
The  effects  of  y,  ,  as  discussed  above  are  clear  in  Fig. 
3.5-10. 

(1)  The  convergence  of  the  pole  is  proportional  to  y,  . 

(2)  The  overall  effect  in  the  convergence  of  the  output 
MSE  is  mainly  the  convergence  rate. 

It  is  interesting  to  note  in  Fig.  3.5-10  that  the  convergence 
rate  of  the  feedforward  weight  is  equal  for  all  the  values 
of  yb. 

Table  3.5-1  compares  the  convergence  time  measured  in 
the  experiments  presented  in  Fig.  3.5-9  and  Fig.  3.5-10  to 
the  estimation  given  by  (3.5-26)  with  experimentally  deter- 
mined proportionality  constants.   The  modified  estimation 

formula  for  N  =2  is : 
b 


TC  =  203. 12y  ~'3392/N~  +  3.8  R/y,  (3.5-30) 


Table  3.5-1  shows  good  agreement  between  experimental 
measurements  of  RGS  filter  convergence  time  and  the  esti- 
mations of  (3.5-30).   This  agreement  verifies  the  analysis 
of  the  RGS  IIR  filter  convergence  properties. 
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Table  3.5-1 
RGS  Convergence  Time  Measurement  and  Estimation 
(Results  are  an  average  of  32  runs  with  N  =3,  N,  =2) 


# 

Results 
Presentation 

Experimej 
ya 

it  Parai 

tieters 
R 

Converge 
Measured 

rice  Time 
Estimation 
(Eq.3.5-30) 

1 

Fig.  3.5-9 

30xl0~6 

.1 

500 

30,500 

31,079     ! 

2 

Fig.  3.5-9 

lOxlO"6 

.1 

500 

41,000 

36,517 

3 

Fig.  3.5-9 

and 
Fig.  3.5-10 

3xl0"6 

.1 

500 

57,000 

45,334    \ 

4 

Fig.  3.5-10 

3xl0~6 

.05 

500 

66,000 

64,372 

5 

Fig.  3.5-10* 

3xl0~6 

.025 

500 

100,000 

102,448    \ 

NOTE 


*  Convergence  time  for  #5  cannot  be  measured  from  Fig.  3.5-10 
because  only  part  of  this  experiment  data  is  plotted  here. 
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3.6   APRIORI  STRUCTURED  ADAPTIVE  FILTERS  (ASAP) 

Motivated  by  the  possibility  to  reduce  the  number  of 
variables  under  random  search  adaptation,  we  now  investigate 
the  relationship  of  the  filter's  weights  to  a  smaller  set  of 
variables.   In  some  cases  the  structure  of  the  optimal  filter 
is  known  and  only  a  few  parameters  are  unknown  and  need  to  be 
evaluated.   In  other  cases  we  might  accept  a  sub-optimal 
simpler  solution,  namely  an  optimal  filter  with  structural 
constraint.   For  the  filter  (3.1-1)  we  might  have  a  smaller 
set  of  parameters,  W,  such  that: 

a  =  f  (W) "   i=0,l,...,N  -1  (3.6-1) 

bi  =  5i(W)    i=l,2,...,Nb  (3.6-2) 


where:   f.(-)  and  g.(-)  are  functions  that  connect  each 
filter  weight  to  the  parameter  vector  W. 

Since  we  have  a  good  solution  to  the  feedforward  weight 
adaptation,  it  is  acceptable  to  use  the  same  combined  random 
and  gradient  search  method  that  was  used  for  the  RGS  IIR 
filter,  for  the  proposed  ASAF  filter  as  presented  in  Fig. 
3.6-1. 

We  will  continue  the  discussion  by  considering  the  speci- 
fic case  of  a  pole  close  to  the  unit  circle  with  adaptation 
of  it's  frequency  only,  see  Fig.  3.6-2. 

This  type  of  adaptive- filter  is  useful  for  the  ALE 
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application,  or  when  several  such  sections  are  cascaded  to 
enhance  a  multiple  sine  wave  signal. 

Fig.  3.6-3  presents  a  typical  operation  of  an  Apriori 
Structured  pole,  ASPOI*  IIR  adaptive  filter  with  pole  magnitude 
p  =  .99.   Simulation  details  are  presented  in  Appendix  A. 
It  is  clear  from  Fig.  3.6-3  that  this  filter  has  high 
processing  gain  with  relatively  short  convergence  time. 
These  advantages  are  also  shown  in  Fig.  1.5-1. 
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3.7   CONCLUSION 

An  IIR  filter  has,  in  many  cases,  great  advantages  over 
a  FIR  filter  because  of  the  efficiency  associated  with  the  use 
of  recursion  and  the  existence  of  poles  in  the  transfer  func- 
tion.  However,  realization  of  an  adaptive  IIR  filter  is  a 
difficult  task  because  of  the  multimodal  nature  of  the  IIR 
filter's  performance  surface,  as  well  as  the  stability 
problem  and  the  complexity  of  the  performance  surface 
gradient  expression. 

The  proposed  Random  and  Gradient  Search  (RGS)  algorithm 
overcomes  the  complexity  of  the  multimodal  performance  sur- 
face and  converges  to  the  global  optimum  with  probability  1. 
This  convergence  is  guaranteed  for  sufficiently  large  time. 
For  the  important  special  case  of  large  difference  in  the 
value  of  the  performance  function  between  the  global  minimum 
and  local  minima,  an  estimation  for  the  average  convergence 
time  has  been  derived  and  verified  by  simulation  results. 

The  convergence  time  estimate  is  given  by: 

a2 

a7   _        Nk 
TC  =  a  y     /ST  +  a,  •  -f —   •  R  (3.7-1) 

b  a     a    1    y, 

where  a,,  a.-,    a,,  a_  are  experimentally  evaluated  constants, 
o    7    I    £■ 

Stable  operation  of  the  RGS  IIR  filter  was  demonstrated 
in    many     hours  of  computer  simulation  without  overflow 
problems.   This  stability  is  attributed  to  the  detection  of 
excessive  MSE  at  the  tested  points,  and  the  fact  that  the 
algorithm  discards  such  points  before  overflow,  which  is 
causes  by  unstable  filter  weights,  can  be  developed.   Thus 
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the  RGS  is  a  practical  candidate  to  realize  an  adaptive  IIR 
filter. 

The  Apriori  Structured  Adaptive  Filter  (ASAF)  uses  the 
random  search  method  and  additional  structure  information 
to  improve  adaptive  IIR  filter  performance.   The  moving 
pole  example,  for  instance,  is  guaranteed  to  be  stable  and 
has  fewer  parameters  in  the  adaptation  process. 
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IV.   SUMMARY 

Two  approaches  to  efficient  adaptive  filtering  have  been 
investigated;  a  FIR  filter  with  simplified  gradient  estima- 
tion methods,  and  IIR  filters  with  a  combined  random  search 
and  gradient  adaptation  scheme. 

Two  simplified  algorithms,  the  Fixed  Step  Correction  (FSC) 
and  the  Simplified  LMS  (SLMS) ,  are  derived  and  compared  to  the 
classical  LMS  algorithm  for  the  FIR  filter.   The  comparison 
includes  analysis  of  filter  properties,  and  extensive  simu- 
lation results  are  presented  to  verify  the  analysis. 

Because  the  adaptive  filter  properties  depend  upon  the 
statistics  of  the  input  signal,  the  desired  signal,  and  the 
reference  signal,  when  analyzing  the  operation  of  an  adaptive 
filter  one  must  assume  some  statistics  for  the  above  signals. 
Hence,  the  analysis  is  valid  for  a  specific  case  or  a  class 
of  cases. 

Thus  algorithm  comparisons  and  the  adaptive  filter  pro- 
perties analysis   has  been  carried  out  here  for  the  applica- 
tion of  the  adaptive  line  enhancement  (ALE) .   The  analysis 
includes  convergence  time  estimate,  steady-state  misadjustment, 
and  filter  processing  gain.   Estimates  to  these  properties 
have  been  derived  and  verified  by  simulation  results  which 
compare  the  three  algorithms  (LMS,  FSC,  SLMS) .   The  conclu- 
sion of  the  comparison  is  that  for  equal  filter  order  the  LMS 
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algorithm  is  somewhat  better.   However  when  one  considers  an 
equal  complexity,  which  allows  the  use  of  a  higher  degree 
filter  for  the  simplified  gradient  estimations,  the  result  is 
that  the  FSC  and  the  SLMS  are  better  than  the  classical  LMS . 

The  IIR  filter  offers,  in  many  cases,  computational 
savings  over  a  FIR  filter.   However,  the  IIR  filter  has 
a  multimodal  performance  surface  and  may  be  unstable.   Be- 
cause of  these  two  problems ,  and  the  complexity  of  the  gra- 
dient expression,  the  algorithms  which  have  been  proposed  for 
the  IIR  adaptive  filter  by  Feintuch  and  Stearns  do  not  provide 
a  satisfactory  solution.   The  Random  and  Gradient  Search 
algorithm  (RGS)  proposed  here  has  the  ability  to  converge 
to  the  global  minimum  of  the  multimodal  performance  surface, 
and  convergence  with  probability  one  is  guaranteed  for  suf- 
ficiently large  time.   For  the  important  class  of  cases 
characterized  by  a  global  minimum  much  lower  than  the  local 
minimum,  an  average  convergence  time  estimate  has  been  de- 
rived and  verified  with  simulation  results.   The  use  of 
structure  information  of  the  optimal  solution  when  known 
allows  the  construction  of  an  Apriori  Structured  Adaptive 
Filter  (ASAF) .   This  version  of  the  RGS  IIR  filter  optimizes 
a  smaller  set  of  parameters  and  is  advantageous  in  some  prac- 
tical applications.   In  summary  this  research  has  demonstrated 
that  with  the  RGS  scheme  it  is  possible  to  realize  an  adaptive 
IIR  filter  which  will  operate  properly  and  have  a  practical 
implementation . 
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At  the  end  of  this  dissertation  it  is  appropriate  to 
indicate  some  of  the  subjects  that  call  for  further  work: 
In  the  area  of  FIR  adaptive  filters  with  simplified  gradient 
estimation,  some  topics  are: 

-  Effects  of  finite  arithmetic. 

-  The  study  of  variations  of  convergence  and  steady- 
state  behavior  around  the  mean. 

-  The  dependence  of  convergence  and  steady-state  behavior 
on  input  signal  to  noise  ratio. 

-  Operation  with  complicated  signals . 

-  Extension  of  the  discrete  algorithms  developed  here  to 
analog  systems,  including  adaptive  antenna  arrays. 

-  Consideration  of  non-stationary  input  signals. 

-  Applications . 

In  the  area  of  RGS  IIR  filters;  topics  include: 

-  The  study  of  random  search  decision  error  dependence 
upon  filter  parameters . 

-  Analysis  of  possible  processing  gain  dependence  upon 
filter  parameters . 

-  The  effects  of  operation  with  an  inaccurate  random 
number  generator. 

-  Possible  configurations  for  apriori  structured  adaptive 
filters . 

-  Operation  with  complicated  signals. 

-  Consideration  of  non-stationary  input  signals. 

-  Applications 


129 


APPENDIX  A 
SIMULATION 

A.l   THE  SIMULATION  METHOD 

Simulation  was  used  to  provide  experimental  data  for 
comparison  of  adaptive  filtering  algorithms,  and  for  veri- 
fication of  analytic  formulas.   The  simulation  program  includes 
four  basic  functions: 

(1)  Execution  initialization:   Signal  parameters  (fre- 
quency, signal  to  noise  ratio,  power ,. .etc . )  and  filter 
parameters  (filter  type,  number  of  weights,  adaptation  gain, 
...,  etc.)  are  loaded  interactively  into  the  computer  to  con- 
trol the  forthcoming  execution. 

(2)  Experiment  configuration  and  signals  generation: 
A  signal  generator  subroutine,  determined  by  the  signal 
parameters  loaded  in  the  execution  initialization,  prepares 
sequences  of  100  samples  of  input  and  reference  signals  to 

be  processed  by  a  filter  subroutine.   The  relationship  of  the 
input  and  reference  samples  determines  the  adaptive  filter 
simulation  to  be  evaluated,  i.e.  ALE  or  parameter  identifica- 
tion. 

(3)  The  filtering  function:   As  controlled  by  the  filter 
parameters  loaded  during  the  execution  initialization,  a 
filter  subroutine  is  called  upon  to  process  the  data  in  blocks 
of  100  samples. 

(4)  Experiment  data  extraction  and  storage:   The  program 
stores  the  values  of    J(k)    and  200  values  (spread  equally 
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over  the  time  of  the  experiment)  of  up  to  six  parameters, 
and  at  the  end  of  the  experiment  the  program  evaluates  the 
processing  gain  and  the  convergence  time  for  that  experiment. 
The  foregoing  information,  together  with  the  experiment 
parameters,  are  stored  on  disk  files  and  are  available  for 
off-line  use. 

A  simplified  flow  diagram  of  the  simulation  program  is 
presented  in  Fig.  A. 1-1.   It  should  be  noted  that  although 
the  FIR  and  IIR  simulation  programs  have  identical  structure 
there  are  some  differences  as  discussed  in  sections  A. 2  and 
A. 3. 

A  data  handling  program  is  used  to  access  the  data  files 
and  present  experimental  results  in  tabular  or  graphic  forms. 
The  graphic  option  includes  plots  of  variables  as  function 
of  time  and  as  functions  of  a  filter  or  a  signal  parameters. 

The  simulation  was  done  on  a  PDP-11/50  minicomputer  under 
RSX-11M  multiuser  operating  system. 
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Simulation  Simplified  Flow  Diagram 


A. 2   FIR  SIMULATION  PROGRAM 

The  FIR  simulation  uses  the  Adaptive  Line  Enhancement 
(ALE)  configuration  as  presented  in  Fig.  A. 2-1.   The  desired 
signal  is: 


s(k)  =  /  2R   (o)'   cos  uk 
ss 


(A. 2-1) 


where  R   (o)  is  the  desired  signal  power, 
And: 


x(k)  =  s(k)  +  n(k) 


where  n(k)  is  a  white  gaussian  noise  with  variance  R   (o) . 

nn 

The  execution  initialization  controls  the  parameters  R   (o) , 

R   (o)  ,  and  co.   For  each  N   and  signal  statistics  used,  the 
nn  a 

program  evaluates  the  optimal  values  of  the  mean  squared 

* 

error,  J  .  ,  and  weights  a.  as  follows: 
min'        3      l 


J  .   =  lim   J(k) 
min 


(A. 2-2) 


a 
k 


a.  =  lim    a.  (k) 
l  i 


for  i=0,l, . . . ,N  -1 
'a 


(A. 2-3) 


M   ■*  O 
a 
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For  200  points  spread  equally  over  the  time  of  the  experiment  " 
the  program  evaluates  the  performance  function,  J(k): 

J(k)  =  Jmin  +  C^-A  ]T   R  [A^-A  ]  (A.  2-4) 

where 

A,  is  the  vector  of  filter  weights  at  time  k. 

* 
A  is  the  vector  of  optimal  weights  determined  by  (A. 2-3). 

R  is  the  input  signal  autocorrelation  matrix. 

At  the  end  of  the  experiment  the  program  evaluates : 

(1)  The  steady-state  MSE,  J   ,  as  the  average  value  of 
J(k)  in  the  last  10%  of  the  experiment. 

(2)  Convergence  time,  TC,  directly  from  the  definition 

as  the  time  required  for  the  error  (J(k)-J   )  to  be 

ss 

reduced  to  10%  of  its  initial  value. 

(3)  Misadjustment ,  M: 


J   -  J  . 

M  =  j (A. 2-5) 

min 


(4)   Processing  gain,  PG: 

R   (o) 

PG  =  10  log  [-22 ]  (A. 2-6) 

j 
ss 

The  program  includes  filter  subroutines  that  perform  the :LMS , 
FSC,  and  the  SLMS  algorithms. 
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A. 3   IIR  SIMULATION  PROGRAM 

Three  configurations  were  used  for  IIR  adaptive  filters 
simulation: 

-  ALE  configuration  as  discussed  in  Section  A. 2.   Dif- 
ferences from  the  FIR  program  are  indicated  later. 

-  Parameter  Identification  example  as  shown  in  Fig. 

A. 3-1.   This  example  was  used  to  demonstrate  conver- 
gence to  the  global  minimum  of  a  multimodal  performance 
surface.   This  example  is  taken  from  Johanson  and 
Larimore  [11]  and  was  used  also  by  Parikh  and  Ahmed  [13] 

-  Stability  test,  as  shown  in  Fig.  A. 3-2.   This  test  was 
used  with  the  optimal  location  for  the  pole  near  the 
unit  circle  to  demonstrate  the  stability  of  Feintuch 
and  the  random  search  algorithms,  and  the  lack  of 
stability  of  the  Stearns'  algorithm. 

The  ALE  IIR  experiments  are  similar  to  the  FIR  except  for 
the  following: 

(1)  The  optimal  values  of  (A. 2-2)  and  (A. 2-3)  are  not 
used . 

(2)  The  MSE,  J(k),  is  evaluated  by: 

N  -1 
av  2 

J  Ik)  =   I    [y(k-j)  -  s(k-j)]  /N  (A. 3-1) 

j=0  av 

where  N   is  an  averaging  interval,  the  values  used  were 
between  100  and  500.   Obviously  J(k)  is  evaluated  only  once  in 
each  averaging  interval. 
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Fig.  A. 3-1 
Parameter  Identification  Example 
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Fig.  A. 3-2 
Stability  Test 
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(3)   The  misadjustment,  M,  of  (A. 2-5)  is  not  used. 
The  program  includes  filter  subroutines  that  use  the 
following  IIR  algorithms : 

-  Feintuch 

-  Stearns 

-  Random  Search 

-  RGS 

-  Apriori   Structured  Pole. 

In  order  to  present  the  algorithm  details  and  as  an  example 
of  a  filter  subroutine,  the  RGS  subroutine  is  given  in 
Appendix  B. 
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APPENDIX   B 


SUBROUTINE    RGS 
C 

f*  ^  "^  ♦  ^  *^  *J^  "^  ^  ^  ^f  "^  'A'  41  ^"  ^t  ^  ^^  ^  ^  14'  "^  "^  ^  ^  ^r?  ,4f  St"  ^  ^  ^  '^  ^k  "^  *Ar  ^4f  ^k  ^  *Jr"  *Ar  *A"  "^  "^  *tV  ^^  ^f  "^  'A*  ^^  '1*'  ^  ^  ^  4^  "i*  ,Ar  "^ 
w  ^P  ^p  ^p  ^p  ^P-  ^P  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  t^  *^  ^p  *^  ^r  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^P  ^p  ^p  ^P  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^*  ^p  ^p 

C  )fc 

C  RANDOM    AND    GRADIENT    SEARCH    SUBROUTINE  * 

C  * 

C  X(IOO)       INPUT    DATA    IN    100    ELEMENTS    ARRAY  * 

C  R<100)       REFERENCE    DATA  * 

C  Y<100)       OUTPUT    DATA  * 

C  A<     ♦     )       FEEDFORWARD    WEIGHTS  * 

C  B(     ♦     )       FEEDBACK    WEIGHTS  * 

C  * 
w  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^^  ^p  ^^  ^^  ^^  ^p  ^p  ^p  ^p  ^p  ^p  ^p  ^p  t*  ^p  *^  ^p  ^p  ^p  ^p  ^^  !^p  *^  ^p  ^p  ^p  ^p  ^p  ^n  ^p  ^^  ^p  ^^  ^p  ^^  ^p  ^p  ^s  ^p  ^p  ^p  ^p  ^p  ^p 

c 

C  THE    COMMON    BLOCK 

INCLUDE    'AR.CMN' 
DO    100    K=l>100 
C  FEEDBACK    SECTION    PROCESSING* 

C  BT<.)    IS    THE    TESTED    POINT. 

YN=X(K) 
YTN=X(K) 
DO    101    J=1fNB 

YN=YN+B(J)*YB(J) 
YTN=YTN+BT(J)*YT(J) 

101  CONTINUE 

C  SHIFTING    THE    SIGNAL    IN    THE    FILTER'S    MEMORY 

DO    102    J=1»NAB-1 

YB<NAB-J+1)=YB<NAB-J) 
YT  <  NAB- J+l ) =YT  <  NAB- J ) 

102  CONTINUE 
YB<1)=YN 
YT(1)=YTN 

C  FEEDFORWARD    SECTION    PROCESSING 

Y<K)=0.0 
ZT=0.0 
DO    103    J=1»NA 

Y(K)=Y<K)+A(J)*YB(J) 

ZT=ZT+A(J)*YT(J) 

103  CONTINUE 

C  THE    ERROR    TERMS 

ER=Y<K)-R<K> 

ET=ZT-R(K) 
C  FEEDFORWARD    WEIGHTS'    ADAPTATION 

DO    104    J=1fNA 

A  <  J ) =A ( J ) -G A*S IGN  < 1 ♦ 0 1 ER ) *S IGN ( 1 . 0 »  YB ( J  > ) 

104  CONTINUE 

C  PERFORMENCE    ^UNCTION    ESTIMATION 
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EC=EC+ER*ER 
ECT=ECT+ET*ET 
IF(ECT.GT.TH)  L=LL 
L=L+1 

IFd-.LE.LL)  GO  TO  111 
C  COMPARISON  AND  RANDOM  SEARCH  DECISION  MAKING 

L=0 

IF(ECT.LT.EC)  GO  TO  200 
GO  TO  201 

200  CONTINUE 

DO    203    J=>lrNB 
B<J)*BT<J> 
YB<J)=YT<J> 

203  CONTINUE 

201  CONTINUE 

C  NEW  TEST  POINT  SELECTION 

DO  204  J=lrNB 

BT(J)=B(J)+GB*GAUSS(0) 

204  CONTINUE 

DO  205  J=lrNAB 
YT(J)=»YB<J> 

205  CONTINUE 
EC=0.0 
ECT=0.0 

111        CONTINUE 

C  EXPERIMENT  DATA  EXTRACTION 

CALL  AVERR<K> 
IF(I0UT.EQ.2>  CALL  WTPR 
100      CONTINUE 
RETURN 
END 
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